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Intelligent  agents  are  expected  to  generate  plausible  predictions  and  explanations 
in  partially  unknown  and  highly  dynamic  environments.  Thus,  they  should  be 
able  to  retract  old  conclusions  in  light  of  new  evidence  and  to  efficiently  man¬ 
age  wide  fluctuations  of  uncertainty.  Neither  mathematical  logic  nor  numerical 
probability  fully  accommodates  these  requirements. 

In  this  dissertation  I  propose  a  formalism  that  facilitates  reasoning  with  qual¬ 
itative  rules,  facts,  and  deductively  closed  beliefs  (as  in  logic),  yet  permits  us  to 
retract  beliefs  in  response  to  changing  contexts  and  imprecise  observations  (as  in 
probability).  Domain  knowledge  is  encoded  as  if-then  rules  admitting  exceptions 
with  different  degrees  of  abnormality,  and  queries  specify  contexts  with  different 
levels  of  precision.  I  develop  effective  procedures  for  testing  the  consistency  of 
such  knowledge  bases  and  for  computing  whether  (and  to  what  degree)  a  given 
query  is  confirmed  or  denied.  These  procedures  require  a  polynomial  number 
of  propositional  satisfiability  tests  and  hence  are  tractable  for  Horn  expressions. 
Finally,  I  show  how  to  give  rules  causal  character  by  enforcing  a  Markovian  condi¬ 
tion  of  independence.  The  resulting  formalism  provides  the  necessary  machinery 
for  embodying  belief  updates  and  belief  revision,  generating  explanations,  and 
reasoning  about  actions  and  change. 
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CHAPTER  1 


Introduction 


In  their  everyday  interactions  with  the  world,  people  continuously  jump  to  con¬ 
clusions  on  the  basis  of  imperfect  and  defeasible  information.  For  example,  we 
normally  expect  to  find  our  car  where  we  parked  it  last,  and  upon  turning  the 
ignition  key,  we  expect  the  engine  to  start.  These  expectations  are  plausible  but 
not  provable  from  what  is  known  at  the  time  they  are  assessed,  and  they  may 
be  replaced  as  new  evidence  is  encountered.  A  stolen  car  will  not  be  where  we 
parked  it  last.  An  engine  with  a  dead  battery  will  not  start.  Yet,  despite  the 
multitude  of  possible  scenarios,  people  operate  under  a  fairly  uniform  consensus 
as  to  what  is  plausible,  that  is,  what  should  be  upheld  as  true  for  practical  pur¬ 
poses.  This  suggests  that  there  are  simple  principles  that  govern  the  dynamics  of 
plausible  reasoning,  including  the  distinction  between  plausible  and  implausible 
conclusions. 

This  dissertation  is  concerned  with  casting  the  principles  governing  plausible 
reasoning  in  a  formal  language.  We  wish  to  create  through  such  formalization 
programs  capable  both  of  accepting  and  organizing  input  ranging  from  defeasible 
information  such  as  “typically,  if  we  turn  the  car’s  ignition  the  engine  starts” 
to  nondefeasible  (strict)  information  such  as  “all  humans  are  mortal”  and  of  an¬ 
swering  queries  about  what  would  be  a  plausible  conclusion  given  some  particular 
context.  Figure  1.1  presents  a  schematic  of  this  project.  The  set  of  “if  (fi  then 
iff"  rules,  ipi  ^  il>i,  represents  a  knowledge  base  encoding  information  about  the 
world.  The  incompleteness  of  this  information  is  modeled  by  allowing  exceptions 
to  these  rules,  where  8t  represents  the  degree  of  abnormality  of  these  exceptions. 
Rules  expressing  what  is  normally  the  case  without  excluding  the  possibility  of 
exceptions,  are  commonly  known  in  Artificial  Intelligence  (AI)  as  default  rules.1 
A  query  is  a  pair  (f,  a)  representing  the  context  f>  and  the  target  a.  The  con¬ 
text  of  a  query  contains  factual  information  on  what  is  currently  known  about 
the  environment,  which  may  originate  from  either  passive  observations  or  ac¬ 
tive  manipulations.2  The  target  a  is  a  propositional  hypothesis  representing  the 

1In  the  database  literature,  these  rules  play  the  role  of  integrity  constraints,  but  are  normally 
treated  as  hard  laws,  tolerating  no  exceptions  [111]. 

2This  distinction  is  of  crucial  importance,  as  shown  in  Section  5.4. 


1 


Knowledge  Base: 


A  = 


Si  , 
<Pi  ->  y> i 
s 2  < 

9^2  “+  V2 

Sn  I 

l-p n  ~ ^  Wn 


4 


Confirmed 

- ►  a 


Denied 


</>:  Actions  -  Observations 


Query:  ((f),  a) 


(degree?) 


Figure  1.1:  Schematic  of  the  system  proposed. 

agent’s  current  interest.  The  output  is  a  decision  about  whether  a  is  plausible 
(and  to  what  degree)  given  that  <j)  is  true. 

Recently,  defaults  have  been  proposed  in  AI  for  expressing  commonsense 
knowledge  [109,  45].  Inheritance  hierarchies,  for  instance,  encode  the  prototypi¬ 
cal  properties  of  classes  as  defaults.  In  reasoning  about  change,  defaults  encode 
the  tendency  of  properties  to  remain  invariant  in  the  absence  of  relevant  changes. 
In  diagnostic  reasoning,  defaults  encode  the  rarity  of  faulty  components.  Even 
deductive  databases  usually  embed  default  “closed- world”  assumptions  to  fill  in 
missing  information.  In  short,  any  activity  for  which  we  cannot  afford  to  spec¬ 
ify  in  advance  all  responses  to  all  conceivable  situations  seems  to  require  use  of 
defaults. 

Initial  attempts  in  AI  to  formalize  plausible  reasoning  based  on  default  rules 
favored  extensions  of  classical  logic  [108,  87,  88,  90]  to  account  for  the  nonmono¬ 
tonicity  of  the  default-based  inferences.3  Deduction  in  classical  logic  is  monotonic: 
given  that  C  is  entailed  by  a  theory  T,  C  is  also  entailed  by  a  theory  T' ,  where 
T'  is  a  superset  of  T.  On  the  other  hand,  inferences  based  on  default  rules  are 
nonmonotonic:  For  example,  I  believe/infer  that  my  car’s  engine  will  start  ( C ) 
once  I  turn  the  ignition  key  (T),  but  would  like  to  retract  this  belief  (and  in¬ 
fer  that  it  will  not  start,  i.e.,  ->C)  if  the  battery  is  dead  (T').  Although  these 
extensions  successfully  reproduced  this  nonmonotonic  behavior,  the  interactions 
among  defaults  rules  yield  conflicting  and  sometimes  couterintuitive  conclusions. 

3These  formalisms  are  reviewed  in  Section  1.2. 
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For  example,  consider  a  knowledge  base  containing  the  defaults:  “typically  pen¬ 
guins  don’t  fly”,  “typically  birds  fly”,  and  the  nondefeasible  rule  “all  penguins 
are  birds”.  Given  that  Tweety  is  a  penguin  we  may  conclude  that  Tweety  does 
not  fly  based  on  the  information  provided  by  the  first  default.  On  the  other  hand, 
since  Tweety  is  a  penguin,  she  is  also  a  bird,  we  may  conclude  that  Tweety  flies 
using  the  second  default  rule.  The  reason  we  prefer  to  uphold  the  conclusion  that 
Tweety  does  not  fly  is  based  on  the  intuition  that  defaults  providing  information 
about  a  more  specific  class  of  individuals  (i.e.,  penguins  in  this  case)  should  be 
considered  with  a  higher  priority.  There  are  other  cases  of  default  interactions 
and  criteria  for  avoiding  undesirable  inferences  based  on  assumptions  of  minimal 
change ,  and  on  notions  of  causality  and  explanation  [57,  36,  8,  45,  121,  54,  118]. 
Some  proposals  address  the  problem  of  default  interactions  by  asking  the  user  to 
explicitly  specify  preferences  among  rules  (e.g.,  [112,  30,  88,  80]).  Ideally,  how¬ 
ever,  such  information  should  be  extracted  from  the  rules  themselves  (or  their  se¬ 
mantical  interpretation),  since  anticipating  interactions  among  defaults  becomes 
increasingly  difficult  as  the  size  of  the  knowledge  base  grows.  Furthermore,  user 
specification  of  these  preferences  seems  to  require  a  possibly  exhaustive  enumer¬ 
ation  of  cases,  situations,  and  exceptions,  which  was  precisely  what  the  use  of 
default  rules  meant  to  avoid.  A  related  problem  is  the  observation  that  some 
plausible  conclusions  are  harder  to  retract  than  others  in  the  face  of  conflicting 
evidence.  This  observation  suggests  that  semantical  interpretations  of  plausible 
beliefs  should  involve  rankings  or  orderings  among  these  beliefs  (in  addition  to 
truth  values)  [33,  34,  120]. 

On  the  positive  side  a  formalization  in  terms  of  a  logical  framework  offers 
several  advantages  such  as:  independence  of  a  specific  implementation  or  do¬ 
main  and  the  possibilities  of  model  theoretic  interpretations  and  well-founded 
semantics. 

On  the  other  extreme,  an  alternative  to  extending  classical  logic  may  be  prob¬ 
ability  theory:  Uncertainty  can  be  used  to  represent  both  the  incompleteness 
of  the  information  in  the  knowledge  base  and  numbers  can  be  used  to  model 
rankings  of  beliefs.  Furthermore,  Bayesian  conditioning  offers  a  successful  and 
well  understood  method  of  dealing  with  retractions  and  belief  change.  Yet,  a 
straightforward  probabilistic  interpretation  of  plausibility  in  terms  of  numbers 
and  thresholds  will  encounter  obstacles  of  its  own.  First,  the  picture  we  form 
about  our  environment  seems  to  be  encoded  in  terms  of  plain  beliefs ,  that  is, 
propositions  that  are  accepted  as  true  (for  practical  purposes),  and  continue  to 
guide  our  actions  until  refuted  by  new  evidence.  These  propositions  are  trans¬ 
mitted  linguistically,  and  are  qualified  by  expressions  such  as  such  as  “generally”, 
“extremely  typical”,  and  “very  likely”,  which  are  void  of  precise  numerical  value. 
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Second,  plain  beliefs  also  seem  to  be  deductively  closed:  If  ip  is  believed  and  ip  is 
believed,  then  ip  A  ip  is  believed  as  well.  Note  that  if  we  associate  the  acceptance 
of  ip  as  a  believed  proposition  with  P(ip )  >  t ,  where  t  is  some  suitable  threshold, 
it  is  possible  to  have  both  P{ip )  >  t  and  P(p)  >  t  but  P(ip  A  p)  <  t. 

In  this  thesis,  I  propose  a  conditional  interpretation  of  the  default  rules  that 
presents  the  merits  of  both  logic  and  probability.  The  sentence  “if  p  then  ip” 
is  interpreted  as  imposing  a  preference  for  accepting  ip  over  -'ip  if  <p  is  all  that 
is  known.  This  interpretation  is  based  on  an  abstraction  of  probability  theory 
where  “if  p>  then  ip”  constrains  the  conditional  probability  of  ip  given  ip  to  be 
infinitesimally  close  to  1.  Intuitively,  this  amounts  to  according  the  consequence 
ip  a  very  high  likelihood  when  ip  is  all  we  know.4  As  will  be  seen,  at  the  heart  of 
this  formulation  is  the  concept  of  default  priorities ,  namely,  a  natural  ordering 
of  the  rules  which  is  derived  automatically  from  the  knowledge  base.  Repre¬ 
senting  and  reasoning  with  causal  relations  is  enabled  through  a  stratified  set  of 
(probabilistic)  independencies  based  on  Markovian  considerations.  The  result  is 
a  model-theoretic  and  semantically  well-founded  account  of  plausible  beliefs  that, 
as  in  classical  logic,  are  qualitative  and  deductively  closed  and,  as  in  probability, 
are  subject  to  retraction  and  to  varying  degrees  of  firmness. 


1.1  Overview  and  Summary  of  Contributions 

Attaching  probabilistic  semantics  to  conditional  sentences  (i.e.,  if-then  expres¬ 
sions)  goes  back  to  Adams  [1,  2],  who  developed  a  logic  of  indicative  conditionals 
based  on  infinitesimal  probabilities.  This  logic  includes  a  norm  of  consistency, 
called  p-consistency ,  which  tolerates  exceptions  (e.g.,  “typically,  if  <p  then  ip  and 
if  <p  A  p'  then  ~<ip”)  and  rules  out  contradictions  (e.g.,  the  pair  “typically,  if  p 
then  ip”  and  “typically,  if  p  then  ~>ip”).  It  also  admits  a  notion  of  entailment, 
called  p-entailment,  which  guarantees  arbitrarily  high  probabilities  for  the  con¬ 
clusions  whenever  sufficiently  high  probabilities  can  be  consistently  assigned  to 
the  premises. 

Unfortunately,  Adams’  notions  of  p-consistency  and  p-entailment  were  re¬ 
stricted  to  knowledge  bases  containing  only  defeasible  information.  The  first 
contribution  of  this  thesis  is  the  extension  of  Adams’  consistency  and  entailment 
to  handle  both  defeasible  and  strict  information  (Chap.  2).  Strict  information 
is  essential  for  representing  definitional  or  taxonomic  information  (e.g.,  “all  men 
are  mortal”,  “penguins  are  birds”),  and  incorporating  such  information  into  the 

4For  a  different  probabilistic  interpretation  of  the  default  rules,  see  Neufeld  and  Poole  [93]. 
For  a  statistical  interpretation,  see  the  Bacchus  [6]. 


4 


knowledge  base  requires  nontrivial  changes  in  the  notions  of  both  consistency 
and  entailment.  This  extension  cannot  be  accomplished  by  simply  treating  to 
the  strict  conditional  ^  a  as  the  material  implication  <f>  D  a.  For  example, 
whereas  the  pair  {b  D  f,  b  D  ~'f]  is  logically  consistent,  the  desired  semantics 
should  render  the  set  {b  =>  f,  b  =>  ->/}  inconsistent.  In  Chapter  2,  I  provide 
a  probabilistic  semantics  for  strict  conditionals  (j>  =$■  a  as  constraints  on  admis¬ 
sible  probability  functions  forcing  the  conditional  probability  of  a  given  <f>  to  be 
equal  to  1.  I  then  establish  effective  decision  procedures  for  testing  both  consis¬ 
tency  and  entailment  in  knowledge  bases  containing  mixtures  of  defeasible  and 
strict  information.  Procedures  for  reasoning  with  inconsistent  knowledge  bases 
and  ways  of  uncovering  the  set  of  rules  responsible  for  the  inconsistency  are  also 
examined.5 

The  second  contribution  is  the  formalization  and  characterization  of  more 
powerful  notions  of  entailment  (Chaps.  3  and  4).  Default  reasoning  requires  two 
facilities:  One  forcing  retraction  of  conclusions  in  light  of  new  refuting  evidence 
(e.g.,  once  we  learn  that  the  battery  is  dead,  we  no  longer  expect  the  engine 
to  start);  the  other  protecting  conclusions  from  retraction  in  light  of  new  but 
irrelevant  evidence  (e.g.,  the  color  of  the  car  should  not  affect  inferences  regarding 
ignition  keys,  batteries,  or  engines).  p-Entailment  excels  in  the  first  task,  but 
fails  on  the  second  because  it  is  extremely  cautious;  it  only  sanctions  conclusions 
that  attain  high  probability  in  all  probability  distributions  p-consistent  with  the 
knowledge  base.  In  order  to  respect  the  communication  convention  that,  unless 
stated  explicitly,  properties  are  presumed  to  be  irrelevant  to  each  other,  we  must 
consider  only  distributions  that  minimize  dependencies,  that  is,  they  contain  only 
the  dependencies  that  are  absolutely  implied  by  the  knowledge  base  (and  none 
others). 

Chapter  3  details  an  extension  of  p-entailment  where  dependencies  are  mini¬ 
mized  via  the  principle  of  maximum  entropy.6  Chapter  3  also  provides  symbolic 
procedures  for  answering  queries  based  on  this  principle,  without  the  explicit 
computation  of  the  maximum  entropy  distribution.  A  second  extension  of  p- 
entailment,  called  system-Z+,  is  presented  in  Chapter  4.  System-^"1"  restricts  the 
set  of  probability  distributions  to  those  that  assign  to  each  model  the  highest 
possible  likelihood  consistent  with  the  default  rules.  The  behavior  of  these  two 
formalisms  is  compared  and  new  insights  on  how  semantical  features  influence  the 
plausibility  of  the  resulting  theories  are  discussed.  The  approach  based  on  maxi¬ 
mum  entropy  yields  more  intuitive  conclusions  in  some  domains,  but  system- Z+ 

5The  results  in  this  chapter  were  originally  reported  in  Goldszmidt  and  Pearl  [49]. 

6The  use  of  maximum  entropy  in  default  reasoning  as  an  extension  of  p-entailment  was 
proposed  by  Pearl  in  [97]. 
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provides  considerable  computational  advantages.  An  earlier  version  of  Chapter  3 
can  be  found  in  Goldszmidt  et.  al.  [47],  while  preliminary  versions  of  the  results 
in  Chapter  4  were  first  reported  in  Goldszmidt  and  Pearl  [50,  53].  These  two  for¬ 
malisms,  maximum  entropy  and  system-Z+,  are  completely  independent  of  each 
other,  and  Chapter  3  is  not  a  prerequisite  for  the  understanding  of  Chapter  4. 

The  third  contribution  of  this  thesis  is  the  development  of  a  semantical  the¬ 
ory  and  a  computational  facility  for  reasoning  with  variable  strength  defaults 
(Sec.  4.3)  and  soft  or  imprecise  evidence  (Sec.  4.5).  The  capability  to  reason  with 
variable  strength  defaults  is  necessary  in  domains  such  as  diagnosis,  where  the 
analyst  may  feel  strongly  that  failures  are  more  likely  to  occur  in  one  type  of 
devices  (e.g.,  multipliers)  than  in  other  (e.g.,  adders).  The  capability  of  process¬ 
ing  soft  evidence  is  important  when  the  context  ip  (of  a  query)  is  not  given  with 
absolute  certainty,  that  is,  when  there  is  some  vague  testimony  supporting  ip  but 
that  testimony  is  undisclosed  (or  cannot  be  articulated  using  the  basic  proposi¬ 
tions  in  our  language,  e.g.,  testimony  of  the  senses)  so  that  only  a  summary  of 
that  testimony  saying  that  “<p  is  supported  to  a  degree  n”  can  be  ascertained. 

The  introduction  of  graded  defaults  and  soft  evidence  requires  new  query 
answering  machinery  which,  in  the  traditional  probabilistic  setting  turned  out  to 
be  intractable.7  This  thesis  shows  that  the  symbolic  nature  of  system-Z+  admits 
a  more  manageable  class  of  procedures;  they  require  a  polynomial  number  of 
propositional  satisfiability  tests  and  are  therefore  tractable  for  Horn  expressions. 

Augmenting  the  proposed  semantics  with  the  capability  to  represent  causal 
relations,  actions,  and  reasoning  about  change  is  the  final  contribution  of  this  the¬ 
sis  (Chap.  5).  This  is  accomplished  by  invoking  the  principle  of  Markov  shielding , 
which  imposes  a  stratified  set  of  (probabilistic)  independences  among  events.  In¬ 
formally,  the  principle  can  be  stated  as  follows: 

Knowing  the  set  of  causes  for  a  given  effect  renders  the  effect  inde¬ 
pendent  of  all  prior  events. 

I  show  how  the  incorporation  of  this  principle  gives  rise  to  a  norm  of  consistency, 
applicable  to  knowledge  bases  representing  causal  relations,8  and  how  it  solves 
some  of  the  common  problems  associated  with  tasks  of  prediction  and  explanation 
reported  in  the  nonmonotonic  literature. 

7Most  logic-based  schemes  for  default  reasoning  were  also  shown  to  be  highly  intractable  [55] 
residing  in  the  Ef]  level  of  the  complexity  hierarchy,  compared  with  A v2  in  our  system. 

8To  the  best  of  my  knowledge,  this  is  the  first  consistency  criterion  devised  to  ensure  the 
coherence  of  causal  theories. 
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In  Section  5.4,  I  demonstrate  how  the  framework  proposed  in  this  dissertation 
can  embody  and  unify  the  theories  of  belief  revision  (see  Alchourron,  Gardenfors 
and  Makinson  [3])  and  belief  updating  (see  Katsuno  and  Mendelzon  [65]),  two 
theories  of  belief  change  that  have  been  developed  independently  of  research  in 
default  reasoning  and  causal  reasoning.  Basically,  theories  of  belief  change  seek 
general  principles  for  constraining  the  process  by  which  a  rational  agent  ought 
to  incorporate  a  new  piece  of  information  0  into  an  existing  set  of  beliefs  0, 
regardless  of  how  the  two  are  represented  and  manipulated.  Belief  revision  deals 
with  information  obtained  through  new  observations  in  a  static  world,  while  belief 
update  deals  with  tracing  changes  in  an  evolving  world  (subjected  perhaps  to  the 
external  influence  of  actions).9  I  show  that  both  revision  and  update  can  be 
modeled  within  the  same  framework  using  a  qualitative  version  of  probabilistic 
conditioning. 

Finally,  in  Chapter  6  I  discuss  some  open  problems  and  suggest  further  chal¬ 
lenges. 


1.2  Extensional  and  Conditional  Approaches 

Approaches  for  formalizing  defeasible  reasoning  can  be  loosely  categorized  as 
either  extensional  or  conditional ,  depending  on  the  interpretation  assigned  to  the 
rule  tf  — »  0.10  Extensional  approaches  are  based  on  “extending”  classical  logic 
by  using  defaults  as  rules  for  augmenting  the  sets  of  beliefs  in  the  absence  of 
conflicting  evidence  (see  [87,  108,  89,  90,  45,  109,  38]).  These  approaches  regard 
the  default  “if  ip  then  0”  as  a  qualified  license  believe  0  given  the  truth  of  <p. 
Conditional  approaches,  on  the  other  hand,  interpret  the  same  rule  as  a  hard  but 
context  dependent  constraint  to  prefer  0  over  ->0  when  <p  is  all  that  is  known 
(see  [36,  38,  69,  74,  23,  101,  14,  49,  50]).  Conditional  approaches  are  generally 
related  to  conditional  logics  studied  in  philosophy.11 

As  mentioned  above,  extensional  approaches  produced  systems  that  exhibit 
many  aspects  of  nonmonotonicity,  thus  allowing  the  retraction  of  conclusions  in 
light  of  new  information.  However,  they  yield  ambiguous  results  when  confronted 
with  conflicting  defaults,  with  no  way  of  distinguishing  the  intended  from  the  un¬ 
intended  conclusions  (see  [112,  57]).  In  order  to  impose  preferences  and  prevent 
generation  of  undesired  inferences,  special  mechanisms  must  be  devised  to  permit 

Preliminary  versions  of  this  chapter  can  be  found  in  Goldszmidt  and  Pearl  [54,  52]. 

10Not  all  formalisms  can  be  categorized  as  either  extensional  or  conditional.  The  approach 
based  on  multivalued  logics  proposed  by  Ginsberg  [46]  is  one  such  example. 

11  For  a  survey  of  conditional  logics  see  [94], 


7 


the  specification  of  such  preferences  in  the  extensional  approaches.  These  include, 
for  example,  special  nonnormal  defaults  in  default  logic  [30]  (see  Sec.  1.2.1)  and 
priority-driven  minimizations  in  circumscription  [88]  (see  See.  1.2.2).  Such  mech¬ 
anisms  conflict  with  the  original  intent  of  default  inference  systems,  since  they 
require  a  possibly  exhaustive  enumeration  of  the  exceptions  to  each  default  rule 
and/or  an  omniscient  user  capable  of  predicting  and  prioritizing  all  conceivable 
interactions  among  default  rules. 

In  contrast,  proposals  based  on  conditional  approaches  have  proven  success¬ 
ful  in  enforcing  the  desired  preferences  in  cases  of  conflicting  defaults  (see  Ex¬ 
amples  2.1  and  2.2).  These  preferences  stem  automatically  from  the  semantical 
interpretation  of  the  default  rules,  based  on  either  an  infinitesimal  abstraction 
of  probability  theory  or  in  rankings  among  possible  worlds.12  Unfortunately,  the 
initial  versions  of  these  conditional  formalisms  failed  to  sanction  some  desirable 
patterns  of  inference  that  are  readily  sanctioned  in  common  discourse  [36,  97]. 
Their  greatest  limitation  stems  from  the  failure  to  properly  handle  irrelevant  in¬ 
formation.  Recent  extensions  such  as  Delgrande’s  [23],  Lehmann  and  Magidor’s 
rational  closure  [74],  and  Pearl’s  system-/?  [100],  were  successful  in  capturing 
some  aspects  of  irrelevance,  but  are  still  unable  to  handle  some  cases,  for  exam¬ 
ple,  property  inheritance  across  exceptional  subclasses  (see  Chapter  3).  Solving 
these  problems  is  one  of  the  main  contributions  of  this  dissertation  (Chaps.  3 
and  4).  Comparisons  to  Lehmann  and  Magidor’s  work  can  be  found  in  Sec¬ 
tions  3.2,  3.4,  and  3. 6. 13  System- Z  is  a  special  case  of  system-/?*  developed  in 
Chapter  4.  Other  conditional  approaches  described  in  the  literature  are  Geffner’s 
conditional  entailment  [36,  38]  and  Boutilier’s  modal  logic  CO*  [14].  Conditional 
entailment  is  one  of  the  most  powerful  formalisms  for  closing  the  gap  between 
conditional  and  extensional  approaches.  It  is  reviewed  in  Section  4.7.  Boutilier 
proves  the  equivalence  between  CO*  and  the  notions  of  p-consistency  and  p- 
entailment  [14].  He  also  axiomatizes  system-/?  in  terms  of  Levesque’s  notion  of 
only  knowing  formulation  [75],  and  proves  an  interesting  relation  between  the 
rule  priorities  of  system-/?  and  the  epistemic  entrenchment  of  AGM  [3,  33]  (see 
Sec.  4.6). 

Reiter’s  default  logic  [108],  McCarthy’s  circumscription  [87,  88],  and  Moore’s 
autoepistemic  logic  [90]  are  reviewed  next.14  These  three  extensional  approaches 
constituted  the  state  of  the  art  in  the  field  when  I  began  this  research  project. 

12 As  it  turns  out  these  interpretations  are  practically  equivalent  (see  [69]  and  also  Chapter  3). 

13Delgrande’s  [23]  work  is  compared  to  Lehmann  and  Magidor’s  in  [69]. 

14The  descriptions  in  Secs.  1.2.1,  1.2.2,  and  1.2.3  are  not  to  be  taken  as  detailed  accounts  of 
these  formalisms.  The  reader  is  encouraged  to  examine  the  surveys  in  [45,  109]  and  to  consult 
the  relevant  papers  listed. 
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This  review  should  highlight  the  parameters  researchers  use  to  judge  progress  in 
the  field  and  clarify  the  significance  of  the  contributions  of  this  dissertation. 


1.2.1  Reiter’s  Default  Logic 


The  extension  proposed  by  Reiter  [108]  is  based  on  augmenting  classical  first 
order  logic  with  inference  rules  of  the  form 


a(x),:  fT  (i.i) 

l\x) 

where  a(x),/3(x),  and  j(x)  are  well-formed  formulas  (wffs)  with  free  variables 
among  those  in  x.  The  formula  a(x)  is  called  the  precondition,  (3(x)  is  called 
the  test  condition,  and  y(x)  is  the  consequent  of  the  default.  Given  a  tuple  of 
ground  terms  a,  the  rule  in  Eq.  1.1  allows  us  to  conclude  7  (a)  given  that  a  (a)  is 
believed  and  provided  that  /3(a)  is  consistent  with  the  current  set  of  beliefs.  A 
default  theory  T  =  ( W ,  D)  is  composed  of  a  set  W  of  wffs  and  a  set  D  of  default 
rules  of  the  form  specified  by  Eq.  1.1.  Thus,  given  the  theory 


T  =  ({bird(Tweety)} ,  { 


bird(x)  :  flies(x) 
flies(x) 


}), 


(1.2) 


we  can  derive  flies(Tweety).  However,  if  ->flies(T  weety)  can  be  established,  for 
example,  by  augmenting  W  with 


dead(T weety)  D  -> flies(Tweety ),  and  dead(Tweety), 


the  rule  is  blocked,  and  flies(Tweety)  is  no  longer  a  conclusion.  Thus,  nonmono¬ 
tonicity  is  achieved  by  means  of  the  consistency  check  required  by  the  rules.  Note 
that  different  rules  also  interact  throughout  this  consistency  check.  For  example, 
the  theory 


T  =  ({penguin(T weety), penguin(x)  D  bird(x)}, 
penguin(x)  :  -> flies(x)  bird(x)  :  flies(x) 

-'flies(x)  ’  flies(x) 

yields  two  possible  extensions:  one  in  which  the  first  rule  is  blocked  and  T weety 
flies,  and  the  other  in  which  the  second  rule  is  blocked  and  T weety  does  not  fly. 
Extensions  are  formally  defined  as  follows:  Let  us  say  that  r(S')  expands  a  set  of 
wffs  S  according  to  T  if  T(5')  denotes  the  minimal  deductively  closed  set  of  wffs 
which  includes  W  and  every  consequent.  7  of  default  rules  of  the  form  a  :  ->/?/ 7 
in  D  for  which  a  €  F(,S')  and  f3~>  €  S.  An  extension  of  T  is  a  collection  E  of  wffs 
such  that  E  =  V(E). 


})  (1.3) 
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A  default  theory  can  give  rise  to  one,  none,  or  many  extensions,  and  each 
extension  is  intended  to  reflect  a  possible  completion  of  the  classical  theory  W 
according  to  the  rules  in  D.  The  natural  encoding  of  a  body  of  knowledge  in  the 
form  of  a  default  theory  often  gives  rise  to  unreasonable  extensions,  which  must 
be  pruned  (usually  by  the  user)  by  properly  selecting  the  test  conditions  of  the 
defaults  [112,  30,  29].  Thus,  for  example,  in  Eq.  1.3  the  second  default  rule  can 
be  changed  to  read 

bird(x)  :  flies(x)  A  -'penguin(x) 
flies(x) 

and  in  general,  the  test  condition  should  enumerate  all  anticipated  exceptions. 
Default  rules  (such  as  the  one  in  Eq.  1.4)  in  which  the  test  condition  is  not  equal 
to  the  consequent  are  commonly  known  as  nonnormal  defaults  [30,  29]. 

On  the  positive  side,  Reiter’s  default  logic  extends  classical  first-order  logic 
with  nonmonotonic  capabilities  by  means  of  a  formal  yet  simple  device,  that  is, 
by  treating  default  rules  as  special  rules  of  inference.  Of  all  the  extensionai  ap¬ 
proaches,  default  logic  appears  to  be  the  most  stable:  most  work  on  default  logic 
focuses  on  applying  rather  than  modifying  Reiter’s  original  ideas  [45].  Recent 
work  extending  default  logic  and  solving  some  of  its  shortcomings  can  be  found 
in  [17,  24,  43].  Work  on  the  computational  complexity  of  default  logic  is  reported 
in  [66,  10],  and  the  relation  between  default  logic  and  formal  semantics  for  logic 
programming  is  studied  in  [42,  11].  Default  logic  is  compared  to  e-semantics  (a 
conditional  approach  underlying  the  development  of  Chap.  2)  in  [97]. 

1.2.2  McCarthy’s  Circumscription 

Circumscription  minimizes  the  extensions  of  various  predicates  in  a  given  theory, 
thereby  providing  a  closed  world  view  of  their  interpretations.  This  formalism 
is  best  understood  from  a  model-theoretic  perspective.  Let  A(P)  denote  a  first 
order  sentence  containing  the  predicate  P.  In  classical  logic,  a  wff  ip  is  said 
to  be  entailed  by  A(P)  if  ip  is  true  in  every  model  for  A(P).  Circumscription 
weakens  this  condition:  ip  is  entailed  by  Circ[A(P);  P]  (to  read:  “the  circum¬ 
scription  of  P  in  A{P)V)  if  ip  is  true  in  every  model  of  A(P )  which  is  minimal 
in  P  [88,  78].  A  model  M  is  minimal  in  P  when  there  is  no  other  model  that 
assigns  a  strictly  smaller  extension  to  P  and  that  preserves  from  M  the  same 
domain  and  the  same  interpretation  of  symbols  other  than  P.  Thus,  given  a  set 
of  axioms,  circumscription  selects  a  minimal  interpretation  for  some  predicate(s) 
subject  to  the  constraints  imposed  by  the  axioms.  As  the  set  of  axioms  changes,  so 
does  the  minimal  interpretation  that  circumscription  selects,  and  consequently 
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the  set  of  inferred  conclusions  can  shrink  as  new  information  arrives  and  the 
desired  property  of  nonmonotonicity  is  attained.  For  instance,  given  a  knowl¬ 
edge  base  containing  the  fact  penguin(Tweety),  the  circumscription  of  penguin 
yields  the  formula  \/x.penguin(x )  3  x  =  Tioeety.  If  Opus  is  an  object  differ¬ 
ent  from  Tweety,  circumscription  will  allow  us  to  jump  to  the  conclusion  that 
-ipenguin(Opus) .  If  penguin(Opus)  is  learned,  the  circumscription  of  penguin 
will  now  yields  \/x.penguin(x)  D  (x  =  Tweety  V  x  =  Opus). 

Syntactically,  the  circumscription  Circ[A(P);  P]  of  P  in  A(P)  can  be  ex¬ 
pressed  as  the  second-order  schema  [87] 

A(P)  A  A($)  A  Vx.[$(ar)  3  P(x))  3  \fx.{P{x)  3  $(x)],  (1.5) 

where  A($)  denotes  the  logical  sentence  that  results  from  replacing  all  the  occur¬ 
rences  of  P  by  a  predicate  $  with  the  same  arity  as  P.  Eq.  1.5  can  be  understood 
as  stating  that  among  the  predicates  $  that  satisfy  the  constraints  in  A($),  P  is 
the  strongest;  in  other  words,  the  objects  that  satisfy  a  predicate  P  are  exactly 
the  objects  that  can  be  shown  to  satisfy  P. 

Circumscription  adds  nonmonotonic  features  to  first  order  logic  but  does  not 
specify  how  defeasible  knowledge  should  be  encoded.  McCarthy  [88]  introduced 
a  convention  by  which  defaults  such  as  “birds  fly”  are  written  as 

Vx.bird(x)  A  ->abi(x)  3  flies(x)  (1.6) 

and  read  as  “every  non-abnormal  bird  flies”.  Thus,  given  a  set  of  these  de¬ 
faults,  the  expected  behavior  follows  from  minimizing  the  abnormalities,  that  is, 
from  circumscribing  the  ab  predicates.  Note,  however,  that  given  Eq.  1.6  and 
bird(Tweety ),  the  minimization  of  abi  by  Eq.  1.5  will  not  suffice  to  sanction 
flies(Tweety).  This  happens  because  the  model  in  which  no  bird  is  abnormal 
and  therefore  Tweety  flies  is  competing  with  a  model  M'  in  which  ab((Tweety ) 
and  -i flies(Tweety)  and  M'  is  also  minimal  with  respect  to  abi  if  we  leave  all 
the  other  objects  constant.  To  remedy  this  undesirable  situation,  McCarthy  [88] 
proposed  a  more  powerful  formula  circumscription  in  which  certain  other  predi¬ 
cates  are  allowed  to  vary,  thus  allowing  the  minimization  of  some  predicates  at 
the  expense  of  others.  The  circumscription  Circ[A(P,  Z )  :  P,  Z]  of  the  predicate 
P  in  A(P,Z),  where  Z  stands  for  a  tuple  of  predicates  allowed  to  var}^  in  the 
minimization  of  P,  is  defined  as 

A(P,  Z)  A  A($,  tf)  A  Vx.[$(x)  3  P{x)}  3  Vx.[P(x)  3  #(z)]  (1.7) 

Note  that  Eq.  1.7  is  stronger  than  the  schema  in  Eq.  1.5,  since,  in  addition  to 
substitutions  for  P,  Eq.  1.7  permits  substitutions  for  Z.  The  model-theoretic 
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interpretation  of  Circ[A(P,  Z);  P,  Z]  sanctions  as  theorems  the  sentences  that 
hold  in  all  models  for  A(P,  Z)  that  are  minimal  in  P  with  respect  to  Z  [78].  A 
model  M  of  A(P,  Z )  is  minimal  in  P  with  respect  to  Z ,  if  there  are  no  other 
models  M'  of  A(P,  Z)  that  assign  a  smaller  extension  to  P  and  that  preserve 
from  M  the  same  domain  and  the  same  interpretation  of  symbols  other  than  P 
and  Z.  Note  that  the  expected  conclusion,  fliesiTweety ),  follows  in  the  example 
above  by  minimizing  the  abt  predicate  while  allowing  flies  to  vary  since  the  only 
minimal  models  are  those  in  which  -^abi(Tweety)  holds. 

The  generalization  of  circumscription  to  the  case  of  many  predicates  (known  as 
parallel  circumscription )  is  straightforward.  A  more  interesting  extension  is  that 
of  prioritized  circumscription ,  in  which  the  user  is  allowed  to  specify  a  priority 
ordering  among  the  predicates  to  be  circumscribed,  where  predicates  with  higher 
priority  are  circumscribed  (minimized)  at  the  expense  of  predicates  with  lower 
priority  [88,  81].  Thus,  for  example,  if  we  add  to  Eq.  1.6 

'ix.penguin(x)  A  ~^ab,(x)  3  -^flies(x)  (1-8) 

V x .penguin(x)  D  bird(x)  (1-9) 

penguin(Tweety)  (1-10) 

then  -i fly(Tweety )  will  follow  only  if  we  circumscribe  abj  with  a  higher  priority 
than  ab{.  Note  that  the  circumscriptive  policy  -  namely,  the  predicates  to  be 
minimized,  the  priority  ordering,  and  the  predicates  to  be  allowed  to  vary  -  must 
be  specified  by  the  user. 

Circumscription  has  been  extensively  studied  due  to  its  power  and  mathe¬ 
matical  tractability.  Circumscription  shares  some  of  the  shortcomings  of  default 
logic:  The  user  remains  responsible  for  establishing  preferences  among  default 
rules  and  for  sorting  out  their  possible  interactions.  Circumscription  uses  prior¬ 
ities  among  predicates  on  the  minimization  process  to  express  such  preferences. 
Lifschitz  [80]  reports  on  ways  to  incorporate  the  specification  of  such  priorities 
into  the  object  language.  Efforts  directed  toward  providing  guidelines  for  specific 
domains  can  be  found  in  [82,  8,  70] 

1.2.3  Moore’s  Autoepistemic  Logic 

Moore  [90]  originally  proposed  autoepistemic  logic  as  a  reconstruction  of  Mc¬ 
Dermott  and  Doyle’s  nonmonotonic  logic  [89].  Autoepistemic  logic  augments 
propositional  theories  with  a  belief  operator  L,  where  sentences  of  the  form  L<p 
are  read  as  “</?  is  believed”.  The  stable  expansion  of  an  autoepistemic  theory  T, 
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S(T),  is  defined  as  follows 

S(T)  =  Th(T  U  {Lp  :  p  <E  S(T)}  U  Lp  :  p  (f  S(T)})  (1.11) 

where  Th(X)  stands  for  the  set  of  tautological  consequences  of  X.  Stable  expan¬ 
sions  are  intended  to  reflect  possible  states  of  belief  of  an  ideal  rational  agent, 
closed  under  both  negative  and  positive  introspection  [90]. 

Defaults  can  be  encoded  in  autoepistemic  logic  using  an  ab  predicate  similar 
to  circumscription;  thus,  “typically  birds  fly”  will  be  written  bird/\~^Labi  D  flies. 
Given  bird ,  the  only  autoepistemic  expansion  will  contain  -> Labi  and  consequently 
the  proposition  flies.  An  autoepistemic  theory  may  have  one,  none,  or  many 
stable  expansions.  For  instance,  T  =  {-> Lp  D  p}  has  no  stable  expansion,  while 
T  =  {-'Lp  D  q,  -^Lq  3  p}  has  two. 

Since  its  introduction,  autoepistemic  logic  has  been  studied  by  [86,  67,  41,  91]. 
It  has  been  successfully  applied  to  characterize  the  semantics  of  general  logic  pro¬ 
grams  [40,  42]  and  of  truth  maintenance  systems  [107].  Both  characterizations 
require  only  the  replacement  of  logical  negation  by  autoepistemic  negation,  that 
is,  literals  of  the  form  ->p  are  replaced  by  -> Lp .  Levesque  [75]  provides  an  appeal¬ 
ing  semantics  for  autoepistemic  logic  in  terms  of  only  knowing  (see  also  [14]). 

As  in  the  case  of  default  logic  and  circumscription,  autoepistemic  logic  is 
unable  to  automatically  account  for  preferences  among  defaults  and  resolve  their 
interactions  in  a  satisfactory  manner.  As  we  shall  see,  this  problem  is  solved  in 
this  dissertation  by  interpretating  default  rules  as  preference  constraints  on  the 
set  of  possible  situations.  The  basis  for  this  interpretation  is  a  norm  of  consistency 
to  be  introduced  next  in  Chapter  2. 
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CHAPTER  2 


The  Consistency  of  Conditional  Knowledge 

Bases 


2.1  Introduction 


There  is  a  sharp  difference  between  exceptions  and  outright  contradictions.  The 
two  statements  “typically  penguins  do  not  fly”  and  “red  penguins  fly”  can  be 
accepted  as  a  description  of  a  world  in  which  redness  defines  an  abnormal  or 
exceptional  type  of  penguin.  However,  the  statements  Sj:  “typically  birds  fly” 
and  S2:  “typically  birds  do  not  fly”  stand  in  outright  contradiction  to  each  other. 
Whatever  interpretation  we  give  to  “typically”,  it  is  hard  to  imagine  a  world 
containing  birds  in  which  both  Si  and  S2  would  make  sense  simultaneously.  Curi¬ 
ously,  such  conflicting  pairs  of  sentences  can  coexist  perfectly  in  most  nonmono¬ 
tonic  formalisms  directed  at  capturing  and  characterizing  our  everyday  reasoning 
by  including  such  expressions  about  what  is  normally  the  case.  For  example, 
using  the  ab  predicate  advocated  by  McCarthy  [88],  a  straightforward  way  to 
represent  such  statements  in  the  context  of  circumscription  would  be 


Sj  :  Vx.bird(x)  A  -> ab(x )  3  fly(x)  ;  s'2  :  Vx.bird(x)  A  ~^ab{x)  3  ->fly(x),  (2.1) 


which  is  logically  equivalent  to  Vx.bird(x)  3  ab(x).  Similarly,  if  and  s2  are 
expressed  as  the  default  rules 1 

//  .  bird(z)  :  M  fly(a?)  „  _  bird(x)  :  M  fly(x) 

Sl  fly(x)  ■>  S2  ■  fly^) 


Reiter’s  default  logic  [108]  will  produce  two  consistent  sets  of  beliefs,  one  in  which 
“birds  fly”  and  one  in  which  “birds  do  not  fly”. 


Normally,  a  pair  such  as  Sj  and  s2  would  not  be  used  to  encode  the  information 
that  “all  birds  are  exceptional  (or  abnormal )”  as  in  the  case  of  circumscription 
or  to  express  an  ambiguous  property2  of  birds  as  in  the  case  of  default  logic. 


xThe  default  rule  bird (x )yM  ly ( j )  jg  jnformai]y  interpreted  as  “if  x  is  a  bird  and  it  is  consistent 

to  assume  that  x  can  fly,  then  infer  that  x  can  fly”  (see  [108]). 

2  A  property  /  is  ambiguous  if  neither  /  nor  ->/  can  be  verified  from  the  knowledge  base. 
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Rather,  this  kind  of  contradictory  information  is  more  likely  to  originate  from 
an  unintentional  mistake.  Remarkably,  although  humans  readily  recognize  the 
distinction  between  exceptions,  ambiguities,  and  contradictions,  current  work  on 
defeasible  knowledge  bases  presents  no  comprehensive  analysis  of  such  utterances, 
which  could  alert  the  user  to  the  existence  of  contradictory,  possibly  unintended 
statements.  As  a  first  step  in  formulating  a  framework  for  representing  and 
reasoning  with  if-then  rules  admitting  exceptions,  this  chapter  proposes  a  se¬ 
mantically  sound  norm  for  consistency,  accompanied  by  effective  procedures  for 
testing  inconsistencies  and  isolating  their  origins. 

It  is  tempting  to  assume  that  pairs  such  as  and  s2  constitute  the  only  source 
of  inconsistency  and  that  once  we  eliminate  such  contradictory  pairs,  the  remain¬ 
ing  knowledge  base  would  be  consistent,  that  is,  all  conflicts  could  be  rationalized 
as  conveying  exceptions  or  ambiguities.  Touretzky  [122]  has  shown  that  this  is  in¬ 
deed  the  case  in  the  domain  of  acyclic  and  purely  defeasible  inheritance  networks. 
However,  once  the  language  becomes  more  expressive,  allowing  hard  rules  as  well 
as  arbitrary  formulas  in  the  antecedents  and  consequents  of  the  rules,  the  crite¬ 
rion  for  consistency  becomes  more  involved.  Consider  the  knowledge  base  A  = 
{“all  birds  fly”,  “typically  penguins  are  birds”,  “typically  penguins  do  not  fly”}. 
This  set  of  rules,  although  without  contradictory  pairs,  also  strikes  us  as  incon¬ 
sistent:  If  all  birds  fly,  there  cannot  be  a  nonempty  class  of  objects  (penguins) 
that  are  “typically  birds”  and  yet  “typically  do  not  fly”.  We  cannot  accept  this 
knowledge  base  as  merely  depicting  exceptions;  it  looks  more  like  a  program¬ 
ming  “bug”  or  “glitch”  than  a  genuine  description  of  some  state  of  affairs.  If  we 
now  change  the  first  sentence  to  read  “typically  birds  fly”  (instead  of  “all  birds 
fly”),  consistency  is  restored;  we  are  willing  to  accept  penguins  as  exceptional 
birds.  This  interpretation  would  remain  satisfactory  even  if  we  made  the  second 
rule  strict  (to  read  “all  penguins  are  birds”).  Yet,  if  we  add  to  A  the  sentence 
“typically  birds  are  penguins”,  we  again  face  intuitive  inconsistency. 

In  this  chapter  we  propose  a  probability-based  formalism  that  captures  these 
intuitions.  We  will  interpret  a  defeasible  rule  “typically,  if  c p  then  tfn  (written 
(p-^if)as  the  conditional  probability  statement  P(ip \ip)  >  1  —  e  ,  where  e  >  0  is 
an  infinitesimal  quantity.  Intuitively,  this  amounts  to  according  the  consequence 
if  a  very  high  likelihood  whenever  the  antecedent  <p  is  all  that  we  know.  The  strict 
rule  “if  f  then  definitely  <7”  (written  <p  =>•  a)  will  be  interpreted  as  an  extreme 
conditional  probability  statement  P(a\<j))  —  1.  Our  criterion  for  testing  consis¬ 
tency  translates  to  determining  whether  there  exists  a  probability  distribution 
P  that  satisfies  all  these  conditional  probabilities  for  every  e  >  0.  Furthermore, 
to  match  our  intuition  that  conditional  rules  neither  refer  to  empty  classes  nor 
are  confirmed  b}^  merely  “falsifying”  their  antecedents,  we  also  require  that  P  be 
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proper ,  that  is,  it  does  not  render  any  antecedent  as  totally  impossible.  These 
two  requirements  constitute  the  essence  of  our  proposal. 

In  the  language  of  ranked  models  (see  Secs.  3.2,  and  4.2,  and  also  [72]),  our 
proposal  assumes  a  particularly  simple  form.  A  defeasible  rule  <p  —*  ip  imposes 
the  constraints  that  ip  holds  in  all  minimally  ranked  models  of  ip  and  that  there 
will  be  at  least  one  such  model.  A  strict  rule  <p  =>  a  imposes  the  constraint  that 
no  possible  world  satisfies  (p  A  ->er  and  that  at  least  one  possible  world  satisfies  <p . 
Consistency  amounts  to  requiring  the  existence  of  a  ranking  (a  mapping  of  models 
to  integers)  that  simultaneously  satisfies  all  these  constraints.  The  idea  of  attach¬ 
ing  probabilistic  semantics  to  conditional  rules  goes  back  to  Adams  [1,  2],  who 
developed  a  logic  of  indicative  conditionals  based  on  infinitesimal  probabilities.3 
More  recently,  infinitesimal  probabilities  were  mentioned  by  McCarthy  [88]  as  a 
possible  interpretation  of  circumscription  and  were  used  by  Pearl  [95]  to  develop 
a  graphical  consistency  test  for  inheritance  networks,  extending  that  of  Touret- 
zky  [122].  The  proposals  in  [97,  102,  35,  37]  have  extended  Adams’  logic  to  default 
schemata,  and  Lehmann  and  Magidor  [74]  have  shown  the  equivalence  between 
Adams’  logic  and  a  semantics  based  on  ranked  models. 

Unfortunately,  the  notion  of  consistency  treated  in  [2]  and  [95]  was  restricted 
to  systems  involving  purely  defeasible  rules.  This  chapter  extends  Adams’  consis¬ 
tency  results  to  mixed  systems  containing  both  defeasible  and  strict  information, 
and,  as  we  shall  see,  the  extension  is  by  no  means  trivial,  since  a  strict  rule  b  =>•  / 
must  be  given  a  semantics  totally  different  from  its  material  counterpart  b  D  /. 
For  example,  whereas  the  set  of  rules  {b  3  /,  b  D  ->/}  is  logically  consistent, 
our  semantics  must  now  render  the  set  {b  /,  b  =>  ->/}  inconsistent.  The 
need  to  distinguish  between  b  =>  /  and  b  D  /,  where  the  former  is  used  to  ex¬ 
press  generic  knowledge  and  the  latter  as  an  item  of  evidence  is  also  advocated 
in  [35,  37,  23,  104]  (see  Sec.  2.7).  The  implications  of  this  distinction  will  become 
more  apparent  in  Chapter  5,  where  causality  is  introduced  in  the  interpretation 
of  the  conditional  rules. 

In  addition  to  extending  the  consistency  criterion  to  include  mixed  systems, 
we  also  present  an  effective  syntactic  procedure  for  testing  this  criterion  and  iden¬ 
tifying  the  set  of  rules  responsible  for  the  inconsistency.  Finally,  we  analyze  a 
notion  of  entailment  based  on  consistency  considerations.  Intuitively,  a  conclu¬ 
sion  is  entailed  by  a  knowledge  base  if  it  is  guaranteed  an  arbitrarily  high  prob¬ 
ability  whenever  the  premises  are  assigned  sufficiently  high  probabilities.  This 
weak  notion  of  entailment  was  named  p-entailment  in  [2],  e-entailment  in  [97], 

3A  formal  treatment  of  infinitesimal  probabilities  using  nonstandard  analysis  is  given  in  [74] 
and  also  mentioned  in  [120]. 
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and  preferential  entailment  in  [69],  and  it  yields  (semiraonotonically)  the  most 
conservative  “core”  of  plausible  conclusions  that  one  would  wish  to  draw  from  a 
conditional  knowledge  base  [98]. 

The  definition  for  probabilistic  entailment  can  be  partially  extended  to  knowl¬ 
edge  bases  containing  strict  information  using  a  device  suggested  by  Adams  [1] 
whereby,  by  definition,  conditional  rules  whose  antecedents  have  probability  zero 
are  assigned  probability  one.  Thus,  a  strict  rule  such  as  (j>  =$■  a  one  could  con¬ 
ceivably  be  encoded  as  the  defeasible  rule  (<f>  A  ->a)  — >  False.  Another  proposal 
was  made  in  the  preferential-models  analysis  of  [69].  There,  Kraus,  Lehmann, 
and  Magidor  write  (p.  172): 

We  reserve  to  ourselves  the  right  to  consider  universes  of  reference 
that  are  strict  subsets  of  the  sets  of  all  models  of  L.  In  this  way,  we 
shall  be  able  to  model  strict  constraints,  such  as  penguins  are  birds , 
in  a  simple  and  natural  way,  by  restricting  U  to  the  set  of  all  worlds 
that  satisfy  the  material  implication  penguin  3  bird. 

Both  of  these  proposals  suffer  from  two  weaknesses.  First,  they  do  not  capture  the 
common  understanding  that  the  opposing  pair  “all  birds  fly”  and  “all  birds  don’t 
fly”  is  inconsistent,  but  instead  permit  the  conclusion  that  birds  do  not  exist, 
together  with  other  strange  consequences  such  as  “typically  birds  have  property 
P”  where  P  stands  for  any  imaginable  property.  Our  semantics  reflects  the  view, 
also  expressed  in  [23],  that  one  of  the  previous  rules  must  be  invalid  and  that 
no  admissible  model  would  support  both  rules.  Second,  these  proposals  do  not 
permit  us  to  entail  new  strict  rules  in  a  more  meaningful  way,  according  to  our 
commonsense  interpretation  of  conditional  sentences,  than  logical  entailment.  Fox- 
example,  -ia  should  not  entail  a  =>  6,  in  the  same  way  that  “I  am  poor”  should 
not  entail  “if  I  were  rich,  it  should  rain  tomorrow”.  Thus,  the  special  semantics 
we  give  to  conditional  rules,  defeasible  as  well  as  strict,  avoids  such  paradoxes 
of  material  implication  [4]  and,  hence,  brings  mechanical  and  plausible  reasoning 
closer  together. 

This  chapter  is  organized  as  follows:  Section  2.2  introduces  notation  and  some 
preliminary  definitions.  Consistency  and  entailment  are  explored  in  Section  2.3. 
An  effective  procedure  for  testing  consistency  and  entailment  is  presented  in  Sec¬ 
tion  2.4,  while  Section  2.5  contains  illustrative  examples.  Section  2.6  deals  with 
entailment  in  inconsistent  knowledge  bases,  and  the  main  results  are  summarized 
in  Section  2.7.  All  proofs  appear  in  Appendix  A. 
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2.2  Notation  and  Preliminary  Definitions 


The  basic  language  is  a  finite  set  C  of  atomic  propositions  augmented  with  two 
propositional  constants  T  and  F ,  which  are  (informally)  regarded  as  expressing 
a  logical  truth  and  a  logical  falsehood,  respectively.  Let  Cp  be  a  closed  set  of 
propositional  well-formed  formulas  (wffs)  generated  as  usual  from  the  atomic 
propositions  in  C  and  the  connectives  V  and  We  define  a  world  us  as  a  truth 
assignment  for  the  atomic  propositions  in  C.  The  set  of  possible  worlds  is  denoted 
by  fi,  and  if  there  are  n  atomic  propositions  in  jC,  the  size  of  fl  will  be  2".  The 
satisfaction  of  a  wff  p  £  Cp  by  a  world  us  is  defined  as  usual  and  denoted  by 
us  |=  ip.  If  us  satisfies  p,  we  say  that  us  is  a  model  for  p. 

A  defeasible  rule  is  the  formula  p  — 0,  where  p  and  0  are  wffs  in  Cp  and 
— >  is  a  new  binary  connective.  Informally,  each  p  — >  0  represents  an  if-then  rule 
that  admits  exceptions  and  each  may  be  read  as  “if  p  then  typically  0”  or  “if  p 
then  normally  0”.  Similarly,  given  0,  a  in  Cp,  the  new  binary  connective  =4-  will 
be  used  to  form  a  strict  rule  0  =>  a.  A  strict  rule  0  =P  a  is  interpreted  as  “if  0 
then  definitely  <r”.  A  formal  interpretation  of  both  strict  and  defeasible  rules  is 
given  in  the  definition  of  consistency  (Def.  2.2).  Both  — >  and  =4>  can  occur  only 
as  the  main  connective  in  a  rule.  We  will  use  conditional  rules  or  simply  rules 
when  referring  to  a  formula  that  can  be  either  a  defeasible  or  a  strict  rule.  The 
antecedent  of  a  rule  is  the  wff  to  the  left  of  the  main  connective  (single  or  double 
arrow)  and  its  consequent  is  the  wff  to  the  right.  If  r  denotes  a  conditional  rule 
with  antecedent  0  and  consequent  0,  then  the  negation  of  r,  denoted  by  ~r,  is 
defined  as  a  conditional  with  antecedent  0  and  consequent  ->0.  The  material 
counterpart  of  a  conditional  rule  with  antecedent  p  and  consequent  0  is  defined 
as  <p  D  0  (where  D  denotes  material  implication),  and  the  material  counterpart 
of  a  set  A  of  conditional  rules  (denoted  by  A)  is  defined  as  the  conjunction  of 
the  material  counterparts  of  the  rules  in  A. 

A  default  0  is  verified  by  a  world  us  iff  us  |=  p  A  0.  p  — >  0  is  falsified  by 
us  iff  us  (=  p  A  *0.  Finally,  9?  — >  0  is  satisfied  by  us  iff  us  \=  p  D  0.  Strict  rules  are 
verified,  falsified,  and  satisfied  in  the  same  way. 

Definition  2.1  (Probability  assignment)  Let  P  be  a  probability  function  on 
the  space  of  possible  worlds  fl,  such  that  P{u>)  >  0  and  Z0en  P{u)  =  1-  We 
define  a  probability  assignment  P  on  a  formula  p  €  C  as 

PM  =  £  PM-  (2.3) 

Let  A  =  DUS'  be  a  set  of  conditional  rules  such  that  D  =  { pi  — >  0,}  (1  <  i  <  |D|) 
and  S  —  {fj  =$■  Cj}  (1  <  j  <  |S|).  A  probability  assignment  on  a  defeasible  rule 
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p  — »  ip  €  D  is  defined  as 


P(£^  -»  lj>) 


fisgf  =  p(m  ifw>o 

1  otherwise 


(2.4) 


We  assign  probabilities  to  the  rules  in  S  in  exactly  the  same  fashion.  P  will  be 
considered  proper  for  a  conditional  rule  r  with  antecedent  p  if  P(p)  >  0,  and  it 
will  be  proper  for  A  if  it  is  proper  for  every  conditional  in  A. 

□ 


The  probability  assignment  above  attaches  a  conditional  probability  interpreta¬ 
tion  to  the  rules  in  a  given  A.  Ecp  2.4  states  that  the  probability  of  a  conditional 
rule  r  with  antecedent  p  and  consequent  ij)  is  equal  to  the  probability  of  r  being 
verified  (i.e.,  to  |=  ip  /\xjj)  divided  by  the  probability  of  its  being  either  verified  or 
falsified  (i.e.,  w  |=  <p). 

Up  to  this  point  the  only  difference  between  defeasible  and  strict  rules  is 
syntactic.  They  are  assigned  probabilities  in  the  same  fashion  and  are  verified 
and  falsified  under  the  same,  truth  assignments.  Their  differences  will  become 
clear  in  the  next  section,  where  we  formally  introduce  the  notion  of  consistency. 


2.3  Probabilistic  Consistency  and  Entailment 

Throughout  the  rest  of  the  chapter,  A  denotes  a  knowledge  base  of  conditional 
rules.  A  =  D  U  S',  where  D  =  {<pi  — >  ipi}  (1  <  i  <  |D|)  and  S  =  {4>j  =>  aj} 

(i  <  i  <  |S|). 

Definition  2.2  (Probabilistic  consistency)  We  say  that  A  =  D  U  S  is  prob¬ 
abilistically  consistent  (p-consistent)  if  for  every  e  >  0,  there  is  a  probability 
assignment  P  that  is  proper  for  A  such  that  P(^\ip)  >  1  —  e  for  all  defeasible 
rules  ip  — »  V’  in  D  and  P(a\(f>)  =  1  for  all  strict  rules  <t>  =P  a  in  S. 

□ 


Intuitively,  consistency  means  that  it  is  possible  for  all  defeasible  rules  to  come 
as  close  to  certainty  as  desired,  while  all  strict  rules  hold  with  absolute  certainty. 
Another  way  of  formulating  consistency  is  as  follows:  Consider  a  constant  e  >  0 
and  let  Va,e  stand  for  the  set  of  proper  probability  assignments  for  A  such  that 
if  P  €  Va,e  then  P(i/)\<p)  >  1  —  £  for  every  <p  — »  ip  G  D  and  P{a\<f)  —  1  for  every 
4>  =>  <j  €  S.  Consistency  insists  on  Va,e  being  nonempty  for  every  e  >  0. 
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Before  developing  a  syntactical  test  for  consistency  (Thm.  2.4),  we  need  to 
define  the  concept  of  toleration. 

Definition  2.3  (Toleration)  Let  r  be  a  rule  (either  defeasible  or  strict)  with 
antecedent  a  and  consequent  (3.  We  say  that  r  is  tolerated  by  a  set  A  if  there 
exists  a  world  i o  such  that 

i=\D\  j=|S| 

lo  j=  a  A  f3  f\  (fi  D  ipi  /\  4>j  D  aj.  (2.5) 

4=1  j= 1 

□ 

Thus,  r  is  tolerated  by  a  set  of  conditional  rules  A  if  there  is  a  world  u>  that 
verifies  x  and  satisfies  every  rule  in  A  (i.e.,  no  rule  in  A  is  falsified  by  to). 

Theorem  2.4  Let  A  =  D  U  S  be  a  nonempty  set  of  defeasible  and  strict  rules. 
A  is  p-consistent  iff  every  nonempty  subset  A'  =  D1  U  S'  of  A  complies  with  one 
of  the  following: 

1.  If  D1  is  not  empty,  then  there  must  be  at  least  one  defeasible  rule  in  D' 
tolerated  by  Ah 

2.  If  D'  is  empty  (i.e.,  A'  =  S'),  each  strict  rule  in  S'  must  be  tolerated  by  S'. 

The  following  corollary  ensures  that,  in  order  to  determine  p-consistency,  it  is 
not  necessary  to  check  literally  every  nonempty  subset  of  A. 

Corollary  2.5  A  =  D  U  S  is  p-consistent  iff  we  can  build  an  ordered  partition 
of  D  =  [ Di ,  D2, . . . ,  Dn ]  where 

1.  For  all  1  <  i  <  n,  each  rule  in  Di  is  tolerated  by  S  Ujl”+1  Dr 

2.  Every  rule  in  S  is  tolerated  by  S. 

Corollary  2.5  reflects  the  following  considerations  (see  proof  in  Appendix  A):  If 
A  is  p-consistent,  Theorem  2.4  ensures  the  construction  of  the  ordered  partition. 
On  the  other  hand,  if  this  partition  can  be  built,  the  proof  of  Theorem  2.4  shows 
that  a  probability  assignment  can  be  constructed  to  comply  with  the  require¬ 
ments  of  Def.  2.2.  Corollary  2.5  yields  a  simple  and  effective  decision  procedure 
for  determining  p-consistency  and  identifying  the  inconsistent  subset  in  A  (see 
Sec.  2.4). 

Before  turning  to  the  task  of  entailing  new  rules,  we  need  to  make  explicit  a 
particular  form  of  inconsistency. 
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Definition  2.6  (Substantive  inconsistency)  Let  A  be  a  p-consistent  set  of 
conditional  rules,  and  let  r'  be  a  conditional  rule  with  antecedent  cf> .  We  will 
say  that  r'  is  substantively  inconsistent  with  respect  to  A  if  A  U  {<j>  — ►  True}  is 
p-consistent  but  A  U  {r'}  is  p-inconsistent. 

□ 


Nonsubstantive  inconsistency  occurs  whenever  the  antecedent  of  a  conditional 
rule  is  logically  incompatible  with  the  strict  rules  of  a  consistent  set  A.  It  will 
become  apparent  from  the  theorems  to  follow  that  a  rule  r  is  nonsubstantively 
inconsistent  with  respect  to  a  consistent  A  iff  both  A  U  {r}  and  A  U  {~r}  are 
inconsistent. 

The  concept  of  entailment  introduced  below  is  based  on  the  same  probabilistic 
interpretation  as  the  one  used  in  the  definition  of  p-consistency.  Intuitively,  we 
want  p-entailed  conclusions  to  receive  arbitrarily  high  probability  in  every  proper 
probability  distribution  in  which  the  defeasible  premises  have  sufficiently  high 
probability  and  in  which  the  strict  premises  have  probability  equal  to  one. 

Definition  2.7  (p-Entailment)  Given  a  p-consistent  set  A  of  conditional  rules, 
A  p-entails  <p'  — >  xf>'  (written  A  |=p  p'  — >  xp)  if  for  all  e  >  0  there  exists  6  >  0 
such  that 

1.  There  exists  at  least  one  P  €  Va,s  4  such  that  P  is  proper  for  p'  — >  xp. 

2.  Every  P'  €  Va,s  satisfies  P'{xp  W)  >  1  —  e. 


□ 


Theorem  2.8  relates  the  notions  of  entailment  and  consistency. 

Theorem  2.8  If  A  is  p-consistent,  A  p-entails  p'  — >  xp  iff  <f>'  — *  -'xf>‘  is  substan¬ 
tively  inconsistent  with  respect  to  A. 

Def.  2.9  and  Theorem  2.10  characterize  the  conditions  under  which  condi¬ 
tional  conclusions  are  guaranteed  not  only  very  high  likelihood  but  also  absolute 
certainty.  We  call  this  form  of  entailment  strict  p-entailment. 

4 Recall  that  given  a  consistent  A  =  DUS,  Va ,6  stands  for  the  set  of  probability  assignments 
proper  for  A,  such  that  if  P  6  Va  ,e  then  >1  —  6  for  every  <p  —+  ip  G  D  and  R(cr|</>)  =  1 

for  every  <j>  =>  a  €  S  (see  Def.  2.2). 
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Definition  2.9  (Strict  p-entailment)  Given  a  p-consistent  set  A  of  condi¬ 
tional  rules,  A  strictly  p-entails  <f>‘  =£■  a'  (written  A  | =s  <f'  =$■  &')  if  for  all 
£  >  0 


1.  There  exists  at  least  one  P  <G  Va,e  such  that  P  is  proper  for  <f'  =>-  a'. 

2.  Every  P'  G  VA,e  satisfies  P'{a' \<f>')  =  1. 


□ 

Theorem  2.10  If  A  =  D  U  S  is  p-consistent,  A  strictly  p-entails  <ff  =>■  a'  iff 
S  U  {<p'  — »  True}  is  p-consistent  and  there  exists  a  subset  S'  of  S  such  that 
p'  =>■  -i<x/  is  not  tolerated  by  S' . 

Examples  of  strict  p-entailment  are  contraposition,  {<f  ■=?  if}  |=s  -rtf  ~'<ff 
and  chaining  {f  =?  o ,  a  =>  if }  |=a  <f  =4>  if.  Note  that  strict  p-entailment  subsumes 
p-entailment,  that  is,  if  a  conditional  rule  is  strictly  p-entailed,  then  it  is  also  p- 
entailed.  Also,  to  test  whether  a  conditional  rule  is  strictly  p-entailed,  we  need 
to  check  its  status  only  with  respect  to  the  strict  set  in  A.  This  confirms  the 
intuition  that  we  cannot  deduce  “hard”  rules  from  “soft”  ones. 

Note  that  the  requirements  of  substantive  consistency  in  Theorem  2.10  and 
properness  for  the  probability  distributions  in  Definition  2.2  distinguish  strict 
rules  from  their  material  counterparts  and  establish  a  difference  between  strict 
p-entailment  and  logical  entailment.  For  example,  consider  the  knowledge  base 
A  =  S  =  {c  =4>  -ia},  which  is  clearly  p-consistent.  While  A  =  {c  D  ->a}  logically 
entails  c  A  a  D  6,  A  does  not  strictly  p-entail  cAa=^b,  since  the  antecedent  c  A  a 
is  always  falsified. 

Theorems  2.11  and  2.12  present  additional  results  relating  consistency  and 
entailment.  They  follow  immediately  from  previous  theorems  and  definitions. 
Versions  of  these  theorems,  for  the  case  of  knowledge  bases  containing  only  de¬ 
feasible  rules  first  appeared  in  [2], 

Theorem  2.11  If  A  does  not  p-entail  — *  if,  and  — >  tf  is  substantively 
inconsistent  with  respect  to  A,  then  for  all  e  >  0  there  exists  a  probability  assign¬ 
ment  P1  6  Va,£  which  is  proper  for  A  and  >  if1  such  that  P'(if'\ip')  <  e. 

Theorem  2.12  If  A  =  D  U  S  is  p-consistent,  then  it  cannot  be  the  case  that 

1.  Both  tp  — »  if  and  <p  —>  ->if  are  substantively  inconsistent  with  respect  to  A. 

2.  Both  <f  =>•  u  and  <f  =P  -ay  are  substantively  inconsistent  with  respect  to  S . 

5Whenever  -> ip  is  satisfiable. 
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Procedure  Test_Consistency 
Input:  A  set  of  rules  A  =  D  U  S. 

Output:  Yes/No  depending  on  whether  R  is  consistent. 

1.  Let  D'  ~  D. 

2.  While  D'  is  not  empty,  do: 

2.1  Find  a  rule  d  :  y>  — >  ip  €  D'  such  that  d  is  tolerated  by  S  U  D'\  let 
D'  :=  D'  -  d. 

2.2  If  d  is  not  found,  abort:  Return(No),  A  is  inconsistent. 

3.  Let  5"  :=  5. 

4.  While  S'  is  not  empty,  do: 

4.1  Pick  any  rule  s  :  =4>  cr  €  S' ;  if  s  is  tolerated  by  S,  then  let  S'  :=  S'  —  s. 

4.2  Else  abort:  Return(No),  A  is  inconsistent. 

5.  Return(Yes),  A  is  consistent. 

End  Procedure 

Figure  2.1:  An  effective  procedure  for  testing  consistency  in  0{\D\2  +  |5'|)  propo¬ 
sitional  satisfiability  tests. 

2.4  An  Effective  Procedure  for  Testing  Consistency 

In  accordance  with  Theorem  2.4  and  following  Corollary  2.5,  the  consistency  of 
A  =  D  U  S  can  be  tested  in  two  phases.  In  the  first  phase,  until  D  is  empty 
we  repeatedly  remove  from  D  a  defeasible  rule  that  is  tolerated  by  the  rest  of 
the  rules  in  D  U  S.  In  the  second  phase,  we  test  whether  every  strict  rule  in  S 
is  tolerated  by  the  rest  of  S  (without  removing  any  rule).  If  both  phases  can 
be  successfully  completed,  A  is  consistent;  if  not,  A  is  inconsistent.  Procedure 
TestXonsistency  is  formally  presented  in  Figure  2.1. 

The  same  procedure  can  be  used  for  entailment,  since  to  determine  whether  a 
defeasible  rule  d!  is  entailed  by  A  we  need  only  test  the  consistency  of  AU{~d'} 
and  A  U  {d1}  (to  make  sure  that  the  former  is  substantively  inconsistent).  Given 
that  the  procedure  in  Figure  2.1  is  a  sound  and  complete  test  for  deciding  p- 
consistency,  the  next  theorem  establishes  an  upper  bound  for  the  problem  of 
deciding  p-consistency  (and  p-entailment).  Theorem  2.13  and  the  correctness  of 
the  procedure  TestXonsistency  are  proven  in  Appendix  A. 


23 


Theorem  2.13  The  worst  case  complexity  of  testing  consistency  (or  entailment) 
is  bounded  by  [PS  x  (^-  +  |*S'|)],  where  \D\  and  l^l  are  the  number  of  defeasible 
and  strict  rules ,  respectively,  and  VS  is  the  complexity  of  testing  propositional 
satisfiability  for  the  material  counterpart  of  the  rules  in  the  database. 

Thus,  the  complexity  of  deciding  p-consistency  and  p-entailment  is  no  worse 
than  that  of  propositional  satisfiability.  Although  the  general  satisfiability  prob¬ 
lem  is  NP-complete,  useful  sublanguages  (e.g.,  Horn  clauses)  are  known  to  admit 
polynomial  algorithms  [25]. 

The  order  in  which  rules  are  removed  in  procedure  TestXonsistency  induces 
natural  priorities  among  defaults  that  have  been  used  to  great  advantage  in  sev¬ 
eral  proposals  for  default  reasoning,  as  is  shown  in  Chapter  4  (see  also  [50,  100,  47, 
36]).  These  priorities  have  an  alternative  epistemic  interpretation  in  the  theory  of 
belief  revision  described  by  Gardenfors  [33].  The  fact  that  a  conditional  p  — >  if  is 
tolerated  by  all  those  rules  that  were  not  previously  removed  from  A  means  that 
if  p  holds,  then  ip  can  be  asserted  without  violating  any  rule  in  A  that  is  more 
deeply  entrenched  than  this  conditional.  In  other  words,  adding  the  assertion 
(p  Aip  would  require  a  minimal  revision  of  the  set  of  beliefs  supported  by  A.  The 
formal  relation  between  the  default  priorities  used  in  both  system-  Z  [100]  and 
system-Z+  [50]  (see  Sec.  4.6)  and  the  postulates  for  epistemic  entrenchment  in 
believe  revision  [33]  is  studied  by  Boutilier  [13].  The  origin  of  this  priority  order¬ 
ing  can  be  traced  back  to  Adams  [2],  where  it  is  used  to  build  “nested  sequences” 
subsets  of  A  that  yield  consistent,  high  probability  models.  Such  “nested  se¬ 
quences”  are  used  in  the  proof  of  Theorem  2.4  (see  Appendix  A).  A  similar 
construction  was  also  used  in  [72,  Theorem  5]  to  prove  the  co-NP-completeness 
of  p-entailment  in  the  case  of  knowledge  bases  containing  only  defeasible  rules. 

Once  a  set  of  rules  is  found  to  be  p-inconsistent,  it  would  be  useful  to  iden¬ 
tify  the  rules  that  are  directly  responsible  for  the  contradiction.  Unfortunately, 
the  toleration  relation  is  not  strong  enough  to  accomplish  this  task  since  it  is 
incapable  of  distinguishing  a  rule  “causing”  the  inconsistency  from  one  that 
is  a  “victim”  of  the  inconsistency.  For  example,  consider  the  inconsistent  set 
Di  =  {(p  — »  ip ,  <p  — *  -'ip,  <p  —>  a}.  Since  no  rule  in  Dt  is  tolerated,  the  consistency 
test  will  immediately  halt  and  declare  D,-  inconsistent.  Yet  (p  — >  a  can  hardly  be 
held  responsible  for  the  inconsistency;  cp  a  is  not  tolerated  because  the  mate¬ 
rial  counterpart  of  the  pair  {<p  — »  ip,  (p  — >  -rip]  renders  <p  impossible.6  It  would 
be  inappropriate  to  treat  a  rule  as  the  source  of  inconsistency  merely  because  it 
is  not  tolerated  in  the  context  of  an  unconfirmable  subset.  Rather,  we  would  like 

6Note  that  {<f>  D  ip,<j>D  ->ip}  (=  ~'(P- 
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to  proclaim  a  rule  inconsistent  if  its  removal  would  improve  the  consistency  of 
the  database.  In  other  words,  a  conditional  rule  r  is  inconsistent  with  respect  to 
a  set  A  iff  there  is  an  inconsistent  subset  of  A  that  becomes  consistent  after  r  is 
removed.  Formally, 

Definition  2.14  (Inconsistent  rule)  A  rule  r  is  inconsistent  with  respect  to  a 
set  A  iff  there  exists  a  subset  A'  of  A  such  that  A'  U  {r}  is  p-inconsistent  but  A' 
in  itself  is  p-consistent. 

□ 

Deciding  whether  a  given  rule  is  inconsistent  is  difficult  because,  unlike  the  test  for 
set  inconsistency,  the  search  for  the  indicative  subset  A'  cannot  be  systematized 
as  in  procedure  Test_Consistency.  All  indications  are  that  the  search  for  such  a 
subset  will  require  exponential  time.  Simple-minded  procedures  based  on  remov¬ 
ing  one  rule  at  a  time  and  testing  for  consistency  in  the  remaining  set  do  not  yield 
the  desired  results.  In  A'  =  {a  — *■  b,  a  — >  -> b,a  — >  c,  a  =>-  ~'c}  every  rule  is  incon¬ 
sistent,  yet  it  is  necessary  to  remove  at  least  two  rules  at  a  time  in  order  to  render 
the  remaining  set  consistent.  Likewise,  in  A"  =  {a  — >  b,  a  —*  ->6,  a  c,c  =>  -^b} 
every  rule  is  inconsistent,  yet  only  the  removal  of  a  — ►  b  renders  the  remaining 
set  consistent  (or  confirmable).  Approximate  methods  for  identifying  inconsis¬ 
tent  rules  are  discussed  in  Section  2.6  and  in  the  proof  of  Theorem  2.24  (see 
Appendix  A). 

2.5  Examples 

The  following  examples  depict  some  of  the  rule  interactions  commonly  found  in 
everyday  discourse  which  motivated  the  development  of  nonmonotonic  logics  and 
formalisms  for  default  reasoning.  They  represent  benchmarks  in  nonmonotonic 
reasoning  and  will  be  used  throughout  the  thesis.  As  a  common  denominator, 
Examples  2.1,  2.2,  and  2.3  contain  a  pair  of  conflicting  rules.  Example  2.1  refers 
to  the  case  of  one  if-then  rule  denoting  what  is  generally  the  case,  “if  <p  then  if)", 
and  another  if-then  rule  representing  an  exception  (ip  and  7)  to  the  first  one,  “if 
p  and  7  then  ->?/>”.  Example  2.2  is  similar,  except  that  the  antecedents  of  the 
conflicting  rules  “if  p  then  ?/>”  and  “if  7  then  are  related  through  a  third 
rule,  “if  7  then  9?” ,  which  points  out  that  7  is  a  more  specific  context  than  a. 
Finally,  the  antecedents  in  the  rules  of  Example  2.3  are  unrelated.  Thus,  the 
conflict  cannot  be  resolved  and  the  conclusion  remains  ambiguous  (i.e. ,  neither 
ip  nor  ->?/’  is  sanctioned).  In  all  the  examples,  the  rules  are  modified  slightly  to 
highlight  the  differences  between  exceptions,  contradictions,  and  ambiguities. 
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Example  2.1  (Dead  battery)  Consider  the  following  rules: 

1.  t  =>  c  (“if  I  turn  the  ignition  key,  definitely  the  car  will  start”). 

2.  t  A  b  — >  -ic  (“if  I  turn  the  key  and  the  battery  is  dead,  then  normally  the 
car  will  not  start”). 

This  knowledge  base  is  p-inconsistent:  Any  world  u  \=  t  A  b  A  ->c  (verifying 
Rule  2,  the  only  defeasible  rule)  will  falsify  Rule  1.  Intuitively,  if  the  car  engine 
will  always  start  when  the  ignition  key  is  turned,  we  cannot  accept  any  faults 
(e.g.,  a  dead  battery).  By  changing  the  first  rule  to  be  defeasible,  we  obtain  a 
p-consistent  knowledge  base  Ac: 

1.  t  — >  c  (“if  I  turn  the  ignition  key,  then  normally  the  car  will  start”). 

2.  t  A  b  — *  -ic  (“if  I  turn  the  key  and  the  battery  is  dead,  then  normally  the 
car  will  not  start”). 

The  first  rule  is  tolerated  by  the  second  using  any  world  u>  (=  t  A  ~>b  A  c  (and 
once  Rule  1  is  removed,  Rule  2  is  trivially  tolerated  by  the  remaining  empty  set). 
Among  the  p-entailed  conclusions,  we  have 

1.  Ac  \=p  t  — >  c  (“if  I  turn  the  ignition  key,  then  normally  the  car  will  start”). 

2.  Ac  (=p  t  A  b  — >  ->c  ( “if  I  turn  the  ignition  key  and  the  battery  is  dead,  then 
normally  the  car  will  not  start”). 

3.  Ac  | =p  t  — »  -ifc  (“normally,  when  I  turn  the  ignition  key  the  battery  is  not 
dead”). 

Example  2.2  (Penguins  and  birds)  Consider  the  knowledge  base  presented 
in  the  introduction: 

1.  b  =►  /  (“all  birds  fly”). 

2.  p  — »  b  (“typically  penguins  are  birds”). 

3.  p  — >  ~>f  (“typically  penguins  don’t  fly”). 

Clearly,  none  of  the  defeasible  rules  in  the  example  can  be  tolerated  by  the  rest. 
Consider  a  world  u,  such  that  u  f=  p  A  b  (testing  whether  Rule  2  is  tolerated).  If 
Lo  \=  f  Rule  3  will  be  falsified,  while  if  u  |=  ~>f  Rule  1  will  be  falsified.  Thus,  we 
conclude  that  there  is  no  world  such  that  Rule  2  is  tolerated.  A  similar  situation 
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arises  when  we  check  whether  Rule  3  can  be  tolerated.  Thus,  this  knowledge  base 
is  p-inconsistent.  Making  Rule  1  defeasible  yields  the  so-called  “penguin  trian¬ 
gle”,  Dp  —  {b  — >  f,p  — >  b,p  — >  ->/},  which  is  p-consistent:  b  — »  /  is  tolerated 
by  Rules  2  and  3  through  the  world  a/,  where  u'  \=  b  A  /  and  to'  \=  and, 
once  Rule  1  is  removed,  the  remaining  rules  tolerate  each  other.  Dp  becomes 
p-inconsistent  by  adding  the  rule  b  —►  p  (“typically  birds  are  penguins”),  in  con¬ 
formity  with  the  graphical  criterion  of  [95].  Note  that,  by  Theorem  2.8,  the  rule 
b  — »  ~<p  (“typically  birds  are  not  penguins”)  is  then  p-entailed  by  Dp.  To  demon¬ 
strate  an  inconsistency  that  cannot  be  detected  by  such  graphical  criteria,  con¬ 
sider  adding  to  Dp  the  rule  p  A  b  — »•  /.  Again  no  rule  will  be  tolerated  and  the  set 
will  be  proclaimed  p-inconsistent,  thus  showing  (by  Thm.  2.8)  that  pAb  — >  ->/  is 
p-entailed  by  Dp  as  expected  (“typically  penguin-birds  don’t  fly”).  Interestingly, 
all  these  conclusions  remain  valid  upon  changing  Rule  2  into  a  strict  conditional 
p  =>  b  (which  is  the  usual  way  of  representing  the  penguin  triangle),  showing  that 
strict  class  subsumption  is  not  really  necessary  for  facilitating  specificity-based 
preferences  in  this  example. 

Example  2.3  (Quakers  and  Republicans)  Consider  the  following  set  of  rules: 

1.  n  — >  r  (“typically  Nixonites'  are  Republicans”). 

2.  n  — s-  q  (“typically  Nixonites  are  Quakers”). 

3.  q  =>  p  (“all  Quakers  are  pacifists”). 

4.  r  =>  -'p  (“all  Republicans  are  nonpacifists”). 

5 .  p  c  (“typically  pacifists  are  persecuted”). 

Rule  5  is  tolerated  by  all  others,  but  the  remaining  rules  are  not  confirmable, 
hence  inconsistent.  The  following  modification  renders  the  knowledge  base  con¬ 
sistent: 

1.  n  =>  r  (“all  Nixonites  are  Republicans”). 

2.  n  =>  q  (“all  Nixonites  are  Quakers”). 

3.  q  —>  p  (“typically  Quakers  are  pacifists”). 

4.  r  — »  -ip  (“typically  Republicans  are  nonpacifists”). 

7 “Nixonites”  is  shorthand  for  people  who  share  aspects  of  Richard  M.  Nixon’s  cultural 
background. 
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5.  p  — >  c  (“typically  pacifists  are  persecuted”). 

Indeed,  there  is  a  basic  conceptual  difference  between  the  former  case  and  this 
one.  If  all  Quakers  are  pacifists  and  all  Republicans  are  nonpacifists,  our  intu¬ 
ition  immediately  reacts  against  the  idea  of  finding  an  individual  who  is  both 
a  Quaker  and  a  Republican.  The  modified  knowledge  base,  on  the  other  hand, 
allows  a  Nixonite  who  is  both  a  Quaker  and  a  Republican  to  be  either  pacifist 
or  nonpacifist.  Note  that  both  n  —*  p  and  n  — »  ->p  are  consistent  when  added 
to  the  knowledge  base,  so  neither  one  is  p-entailed  and  we  can  assert  that  the 
conclusion  is  ambiguous  (i.e. ,  we  cannot  decide  whether  a  Nixonite  is  typically  a 
pacifist  or  not). 

Finally,  if  we  make  Rules  2  and  4  the  only  strict  rules,  we  get  a  knowledge 
base  similar  in  structure  to  the  example  depicted  by  network  F6  in  [58]: 

1.  n  — >  r  (“typically  Nixonites  are  Republicans”). 

2.  n  =>  q  (“all  Nixonites  are  Quakers”). 

3.  q  — ■+  p  (“typically  Quakers  are  pacifists”). 

4.  r  =>  ~>p  (“all  Republicans  are  nonpacifists”). 

5 .  p  — >  c  (“typically  pacifists  are  persecuted”). 

Not  surprisingly,  the  criterion  of  Theorem  2.4  renders  this  knowledge  base  consis¬ 
tent  and  n  — >  -ip  is  p-entailed  in  conformity  with  the  intuition  expressed  in  [58]. 

2.6  Reasoning  with  p-Inconsistent  Knowledge  Bases 

The  theory  developed  in  previous  sections  presents  desirable  features  from  both 
the  semantic  and  computational  standpoints.  However,  the  ent ailment  procedure 
insists  on  starting  with  a  p-consistent  set  of  conditional  rules.  In  this  section,  we 
relax  this  requirement  and  explore  two  proposals  for  making  entailment  insen¬ 
sitive  to  contradictory  statements  in  unrelated  portions  of  the  knowledge  base, 
so  that  mistakes  in  the  encoding  of  properties  about  penguins  and  birds  would 
not  tamper  with  our  ability  to  reason  about  politicians  (e.g.,  Quakers  and  Re¬ 
publicans).  The  first  proposal  amounts  to  accepting  local  p-inconsistencies  as 
deliberate  albeit  strange  expressions,  while  the  second  treats  them  as  program¬ 
ming  “bugs”. 

In  Def.  2.1  a  conditional  rule  y  — >  y  was  assigned  the  conditional  probability 
P(i()\ip)  if  P  was  proper  for  — »  ip  (i.e.,  if  P(cp)  >  0).  In  our  first  proposal  for 
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reasoning  with  p-inconsistent  knowledge  bases,  we  will  regard  improper  proba¬ 
bility  assignments  as  admissible  and  define  P(rp\p)  =  1  whenever  P(p)  =  0.8 
With  this  approach,  any  set  A,  as  long  as  A  is  logically  satisfiable,9  can  be  rep¬ 
resented  by  the  trivial,  high  probability  distribution  in  which  some  antecedents 
receive  zero  probability.  Also,  strict  rules  such  as  (f>  =4-  a  can  be  represented  as 
A  -icr  —»  False ,  since  we  can  now  use  P{<j>  A  ->a)  =  0  to  get  P{u\<j>)  =  1.  As 
before,  we  say  that  a  rule  p  —>  ijj  is  implied10  by  a  (possibly  p-inconsistent)  set 
A  if  p  — »  %l>  receives  arbitrarily  high  probability  in  all  probability  assignments  in 
which  rules  in  A  receive  arbitrarily  high  probability. 

Definition  2.15  (px-Implication)  Given  a  set  A  and  a  rule  p'  — >  ip>' ,  A  p\- 
implies  p'  —>  ?/>',  written  A  \~Pl  p'  — >  if)',  if  for  all  £  >  0  there  exists  a  8  >  0  such 
that  for  all  probability  assignments  P,  if  P{il)\p)  >  1  —  8  for  all  p  — »  ip  £  A  and 
P(< 7\<f>)  =  1  for  all  4>  a  £  A,  then  P'(pjj'\p')  >  1  —  e. 

□ 

The  only  difference  between  Def.  2.15  and  that  of  p-entailment  (Def.  2.7)  is  that 
none  of  the  probability  assignments  in  the  definition  above  are  constrained  to  be 
proper. 

Any  p-inconsistent  A  will  have  a  nonempty  subset  violating  one  of  the  condi¬ 
tions  of  Theorem  2.4.  Given  that  almost  all  properties  stated  in  this  section  will 
refer  to  such  sets,  we  find  it  convenient  to  introduce  the  following  definition: 

Definition  2.16  (Unconfirmable  set)  A  =  D  U  S  is  said  to  be  unconfirmable 
if  one  of  the  following  conditions  is  true: 

1.  If  D  is  nonempty,  then  there  cannot  be  a  defeasible  rule  in  D  that  is  toler¬ 
ated  by  A. 

2.  If  D  is  empty  (i.e. ,  A  =  S'),  then  there  must  be  a  strict  rule  in  S  that  is 
not  tolerated  by  A. 

□ 

8Even  though  P(tp  —*  ip)  =  1  if  P(ip)  =  0  in  Def.  2.1,  P(<p  V’)  was  n°t  related  to  a 
conditional  probability  in  those  cases. 

9If  A  is  not  satisfiable  this  proposal  cannot  do  better  than  propositional  logic,  that  is,  any 
conditional  rule  will  be  trivially  entailed. 

10We  will  use  the  term  “implication”  instead  of  “entailment”  to  stress  the  fact  that  the  set  of 
premises  may  constitute  a  p-inconsistent  set.  For  simplicity,  however,  we  will  keep  the  symbol 

N- 
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Note  that  a  set  Au  can  be  unconfirmable,  while  both  a  superset  of  Au  or  one 
of  its  subsets  can  be  confirmable.  The  problem  of  deciding  whether  a  rule  is 
Pi-implied  is  no  worse  than  that  of  deciding  p-entailment,  as  shown  by  the  next 
theorem  (proven  in  [48]). 

Theorem  2.17  A  pi-implies  ip  — »  iff  ip  — >  -<?/>  belongs  to  an  unconfirmable 

subset  of  A  U  {<p  — >  ->?/>} . 

This  unconfirmable  subset  can  be  identified  using  the  p-consistency  test  discussed 
in  Section  2.4  (Fig.  2.1),  and  it  follows  that  pi-implication  also  requires  a  poly¬ 
nomial  number  of  satisfiability  tests.  Moreover,  p-entailment  is  equivalent  to 
pi-implication  if  A  is  p-consistent  (see  Thm.  2.23).  For  example,  consider  the 
union  of  Dp  =  {b  — >  /, p  — »  b,p  — ■>  ->/}  of  Example  2.2  (encoding  the  penguin 
triangle)  and  the  p-inconsistent  set  Dt  =  {f  — »■  if,  f  f  — ■>  a}.  Some  of  the 

rules  pi-implied  by  A;  =  Dv  U  are  p  A  b  — >  ->/  (“typically,  penguin-birds  don’t 
fly”),  b  — »  ->p  (“typically  birds  are  not  penguins”),  and  cj>  — >  a.  Some  of  the  rules 
not  pi-implied  by  A,  are  p  A  b  — >  /  and  p  — >  Thus,  despite  its  p-inconsistency, 
not  all  rules  are  pi-implied  by  A;.  However,  this  example  also  demonstrates  a 
disturbing  feature  of  pi-implication:  Not  only  <f>  —>■  xf  and  — >  -i0  but  also 
(j>  — >  — ,cr  and  f  — >  p  (where  p  is  any  predicate)  are  pi-implied.  Thus,  although 
the  natural  properties  of  penguins  remain  unperturbed  by  the  p-inconsistency  of 
Di,  strange  rules  such  as  (p  — >  p  are  deduced  even  though  there  is  no  argument 
to  support  them  (see  [58]  for  similar  considerations  on  inconsistent  rules  in  the 
context  of  inheritance  networks). 

To  locate  the  source  of  this  phenomenon,  it  is  useful  to  declare  a  formula  to 
be  inconsistent  if  the  formula  is  False  by  default. 

Definition  2.18  (Inconsistent  formula)  Given  a  set  A  and  a  formula  (f>,  we 
say  that  f  is  an  inconsistent  formula  with  respect  to  A  iff  A  pi-implies  <j>  — >  False. 
□ 

The  next  theorem  relates  pi-implication  to  Definition.  2.18  and  provides  an  al¬ 
ternative  definition  of  inconsistent  formulas  in  terms  of  propositional  ent ailment. 
It  is  an  easy  consequence  of  Theorem  2.17. 

Theorem  2.19  Consider  a  set  A  of  conditional  rules  and  the  formulas  a  and  ip: 

1.  A  (=Pl  a  — >  if  iff  a  is  an  inconsistent  formula  with  respect  to  AU{a  — - >  -'ip}. 

2.  If  a  is  an  inconsistent  formula  with  respect  to  A,  any  conditional  rule  with 
a  as  antecedent  will  be  pi-implied  by  A. 
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3.  A  formula  a  is  inconsistent  with  respect  to  a  set  A  iff  there  exists  an  un- 
confirmable  subset  A'  of  A  such  that  A'  (=  ->  <f  11  where  a  is  the  antecedent 
of  a  rule  in  A' . 

Theorem  2.19.2  explains  why  a  rule  such  as  <f>  — »  p  is  pi-implied  by  A,-:  (f  is  an 
inconsistent  formula  with  respect  to  A;,  hence  any  rule  with  <f>  as  antecedent  will 
be  trivially  px-implied  by  A,-. 

This  deficiency  of  px-implication  is  removed  in  p2-implication ,  our  second  pro¬ 
posal  for  reasoning  with  p-inconsistent  knowledge  bases.  The  intuition  behind  p2- 
implication  is  that  a  rule  is  considered  “implied”  only  if  its  negation  would  intro¬ 
duce  a  new  p-inconsistency  into  the  knowledge  base.  Previous  p-inconsistencies 
are  thus  considered  as  programming  glitches  and  are  simply  ignored. 

Definition  2.20  (p2-Implication)  Given  a  set  A,  we  say  that  <f  — >  if  is  p2- 
implied  by  A,  written  A  |=P2  <j>  — »  t/>,  iff  f>  — >  if  is  not  an  inconsistent  rule  with 
respect  to  A  (see  Def  2.14)  but  its  negation  <f>  — >  ->if  is. 

□ 

The  requirement  that  not  both  f  — >  if  and  tf  — >  — be  inconsistent  serves  two 
purposes.  First,  as  with  p-entailment,  it  constitutes  a  safeguard  against  rules 
being  trivially  implied  by  virtue  of  their  antecedents  being  false.  Second,  if  both 
rules  are  inconsistent,  the  contradiction  that  originates  when  either  is  added  to 
A  must  previously  have  been  embedded  in  A  and  therefore  cannot  be  new.  In 
our  previous  example,  the  rules  p  A  b  — >  ->/,  b  — »  ~>p,  and  <j>  — >  a  are  p2-implied 
by  A*;  however,  contrary  to  pi-implication,  the  rules  <f>  —>  if,  <f>  —*  -rtf,  cf  — >  -><7, 
and  <f  p  are  not.  As  stated  in  Theorem  2.23,  p2-implication  is  strictly  stronger 
than  pi-implication  and  is  equivalent  to  p-entailment  if  the  set  A  is  p-consistent. 

Since  the  notion  of  p2-implication  is  based  on  the  concept  of  an  inconsis¬ 
tent  rule  (Def.  2.14),  there  is  strong  evidence  that  any  procedure  for  deciding 
P2*implication  will  be  exponential  (see  Sec.  2.4).  To  obtain  a  more  efficient  de¬ 
cision  procedure,  we  propose  to  weaken  the  definition  of  an  inconsistent  rule. 
Instead  of  testing  whether  a  given  rule  is  responsible  for  a  p-inconsistency,  we 
will  test  whether  the  rule  is  responsible  for  creating  an  inconsistent  formula  (see 
Thm.  2.19). 

Definition  2.21  (Weakly  inconsistent  rule)  The  rule  r  is  weakly  inconsis¬ 
tent  with  respect  to  a  set  A,  iff  there  exists  an  unconfirmable  subset  Au  of 

11  Recall  that  A  denotes  the  conjunction  of  the  material  counterparts  of  the  conditional  rules 
in  A. 
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A  U  {r},  such  that  A„  |=  ~'<f>  but  A'u  where  A'u  =  Au  —  {r}  and  f  is 

the  antecedent  of  some  rule  in  Au. 

□ 

This  leads  naturally  to  the  notion  of  weak  p2-implication. 

Definition  2.22  (wp2-Implication)  Given  a  set  A,  a  rule  <f>  — »  '0  is  wp2- 
implied  by  A,  written  A  | =wp2  ^  iff  </>  — >  ->?/>  is  weakly  inconsistent  with 

respect  to  A. 

□ 

As  in  both  px-  and  p2-implication,  the  set  A,-  =  Dp  U  D{  wp2-implies  the  rules 
p  A  6  — *■  ->/,  b  — >  — >p,  and  <j>  —*  a.  More  importantly,  contrary  to  px -implication 
(but  similar  to  p2-implication),  the  undesirable  rules  <j>  — >  -><r  and  f  — +  p  are 
not  wp2-implied  by  A,-  and,  in  general,  wp2-implication  will  not  sanction  a  rule 
merely  because  its  antecedent  is  inconsistent.  However,  unlike  p2-implication, 
wp2-implication  will  sanction  any  rule  whose  consequence  is  the  negation  of  an 
inconsistent  formula  (for  example,  p  — >  -></>). 

The  notion  of  wp2-implication  is  situated  somewhere  between  px -implication 
and  p2-implication,  as  the  next  two  theorems  indicate.  It  rests  semantically  on 
both,  since  it  requires  the  concepts  of  inconsistent  formulas  and  inconsistent  rules. 
It  also  preserves  some  of  the  computational  advantages  of  pi-implication. 

Theorem  2.23  1.  Given  a  p-consistent  set  A,  the  notions  of  p-ent ailment, 

P\ -implication,  wp2 -imp li c a 1 1 o n ,  and  p2-implication  are  equivalent. 

2.  Given  a  p-inconsistent  set  A,  p2 -implication  is  strictly  stronger  than  wp2- 
implication,  and  wp2-implication  is  strictly  stronger  than  p\-implication. 

Theorem  2.24  If  the  set  A  is  acyclic  and  of  Horn  form,  wp2-implication  can  be 
decided  in  polynomial  time. 

The  need  to  search  for  a  suitable  unconfirmable  subset  Au  (see  Def.  2.22)  results 
in  wp2-implication  being  computationally  harder  than  px -implication. 


2.7  Discussion 

We  have  formalized  a  norm  of  consistency  for  mixed  sets  of  conditionals,  ensur¬ 
ing  that  every  group  of  rules  is  satisfiable  in  a  non-trivial  way,  one  in  which  the 
antecedent  and  the  consequent  of  at  least  one  rule  are  both  true.  We  showed  that 


32 


any  group  of  rules  not  satisfiable  this  way  must  contain  conflicts  that  cannot  be 
reconciled  by  appealing  to  exceptions  or  ambiguities  and  is  thus  normally  con¬ 
sidered  contradictory  (i.e.,  unfit  to  represent  world  knowledge).  Using  this  norm, 
we  devised  an  effective  procedure  to  test  for  inconsistencies  and  established  a 
tight  relation  between  entailment  and  consistency,  permitting  entailment  to  be 
decided  by  using  consistency  tests.  These  tests  were  shown  to  require  polyno¬ 
mial  complexity  relative  to  propositional  satisfiability.  We  also  discussed  ways  of 
drawing  conclusions  from  inconsistent  knowledge  bases  and  of  uncovering  sets  of 
rules  directly  responsible  for  such  inconsistencies. 

One  of  the  key  requirements  in  our  definition  of  consistency  is  that  no  con¬ 
ditional  rule  in  A  should  have  an  impossible  antecedent  and,  moreover,  that 
no  antecedent  should  become  absolutely  impossible  as  exceptions  (to  defeasible 
rules)  become  less  likely  (i.e.,  as  e  becomes  smaller).  This  requirement  reflects 
our  understanding  that  it  is  fruitless  to  build  knowledge  bases  for  nonexisting 
classes  and  counterintuitive  to  deduce  (even  defeasibly)  conditional  rules  hav¬ 
ing  impossible  antecedents.  Consequently,  pairs  such  as  {(f)  — ■>  x/),<f>  — ►  -'?/>}  or 
{(f>  =>  if),  (f>  =>  -'ll)}  are  labeled  inconsistent  and  treated  as  unintentional  mistakes. 
The  main  application  of  the  procedures  proposed  in  this  chapter  is  to  alert  users 
and  knowledge  providers  of  such  glitches,  in  order  to  prevent  undesirable  infer¬ 
ences. 

This  chapter  also  presents  a  new  formalization  for  strict  conditional  rules, 
within  the  analysis  of  probabilistic  consistency,  that  is  totally  distinct  from  their 
material  counterparts.  The  importance  of  this  distinction  has  been  recognized 
by  several  researchers  (see  [104,  23,  35,  37]  and  others)  and  has  both  theoretical 
and  practical  implications. 

In  ordinary  discourse,  conditionals  are  recognized  by  universally  quantified 
subsumptions  such  as  “all  penguins  are  birds”  or,  in  the  case  of  ground  rules, 
by  the  use  of  the  English  word  “if”  (e.g.,  “if  Tweety  is  a  penguin,  then  she  is  a 
bird”).  The  function  of  these  indicators  is  to  alert  the  listener  that  the  assertion 
made  is  not  based  on  evidence  pertaining  to  the  specific  individual,  but  rather 
on  generic  background  knowledge  pertaining  to  the  individual’s  class  (e.g.,  being 
a  penguin).  It  is  this  pointer  to  the  background  information  that  is  lost  if  a 
conditional  rule  is  encoded  as  a  Boolean  expression,  and  it  is  this  information 
that  is  crucial  for  adequately  processing  specificity  preferences. 

Intuitively,  background  knowledge  encodes  the  general  tendency  of  things  to 
happen  (i.e.,  relations  that  hold  true  in  all  worlds)  while  evidential  knowledge 
describes  that  which  actually  happened  (i.e.,  relations  in  our  particular  world). 
Thus,  conditional  rules,  both  defeasible  and  indefeasible,  play  a  role  similar  to 
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that  of  meta-inference  rules:  They  tell  us  how  to  draw  conclusions  from  specific 
observations  about  a  particular  situation  or  a  particular  individual,  but  do  not 
themselves  convey  such  observations.  It  is  for  this  reason  that  we  chose  to  use 
a  separate  connective,  =4>,  to  denote  strict  conditionals,  as  is  done  in  [58]  in  the 
context  of  inheritance  networks.  Strict  conditionals,  by  virtue  of  pointing  to 
generic  background  knowledge,  are  treated  as  part  of  the  knowledge  base,  while 
propositional  formulas,  including  material  implications,  are  used  to  formulate 
queries  but  are  excluded  from  the  knowledge  base  itself.  As  a  result,  the  rule 
p  =>  b  is  treated  as  a  constraint  over  the  set  of  admissible  probability  assignments, 
while  the  propositional  formula  p  D  b  is  treated  as  specific  evidence  or  a  specific 
observation  on  which  these  probability  assignments  are  to  be  conditioned. 

It  does  indeed  make  a  profound  difference  whether  our  knowledge  of  T weety' s 
birdness  comes  from  generic  background  knowledge  about  penguins  or  from  spe¬ 
cific  observations  conducted  on  Tweety.  In  natural  language,  the  latter  case 
would  normally  be  phrased  by  nonconditional  rules  such  as  “it  is  not  true  that 
Tweety  is  both  a  penguin  and  a  non-bird”,  which  is  equivalent  to  the  material 
implication  penguin(T weety)  D  bird(Tweety). 

The  practical  aspects  of  this  distinction  can  best  be  demonstrated  using  the 
penguin  example  (Ex.  2. 2). 12  Assume  we  know  that  “typically  birds  fly”  and 
“typically  penguins  do  not  fly”.  If  we  are  told  “Tweety  is  a  penguin”  and  “all 
penguins  are  birds”,  we  would  like  to  conclude  that  Tweety  does  not  fly.  By  the 
same  token,  if  we  are  told  “Tweet}''  is  a  bird”  and  “all  birds  are  penguins”,  we 
would  have  to  conclude  that  Tweety  does  fly.  However,  note  that  both  {p,p  D  b} 
and  {&,  b  3  p]  are  logically  equivalent  to  {p,  b},  which  totally  ignores  the  relation 
between  penguins  and  birds  and  should  yield  identical  conclusions  regardless  of 
whether  penguins  are  subclass  of  birds  or  the  other  way  around.  Thus,  when 
treated  as  material  implications,  information  about  class  subsumption  is  permit¬ 
ted  to  combine  with  properties  attributed  to  individuals  and  therefore  this  crucial 
information  gets  lost. 

This  distinction  was  encoded  in  [37]  by  placing  strict  conditionals  together 
with  defaults  in  a  background  context ,  separate  from  the  evidential  set  which  was 
reserved  for  observations  made  on  a  particular  state  of  affairs.  In  [72,  p.  212]  it 
is  stated  that  “dealing  with  hard  constraints,  in  addition  to  soft  ones,  involves 
relativizing  to  some  given  set  of  tautologies”.  Here,  again,  strict  conditionals  and 
ground  formulas  would  receive  different  treatment;  only  the  former  are  permit¬ 
ted  to  influence  rankings  among  worlds.  The  separate  connective  used  in  this 
chapter  treatment  makes  this  distinction  clear  and  natural,  and  the  uniform  prob- 

12Taken  from  [37]. 
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abilistic  semantics  given  to  both  strict  and  defeasible  rules  adequately  captures 
the  notion  of  consistency  in  systems  containing  such  mixtures. 

In  Chapter  5,  the  distinction  between  D  and  =4>  is  crucial  since  rules  are  used 
to  impose  a  Markovian  condition  of  independence  among  the  atomic  propositions 
in  the  language,  in  order  to  induce  causal  interpretations,  while  wffs  are  used  to 
provide  the  context  of  a  query. 

The  notion  of  p-entailment  is  known  to  yield  a  rather  conservative  set  of  con¬ 
clusions.  For  example,  let  A'  =  {a  — >  6},  and  let  a,b,c  be  atomic  propositions 
in  C.  It  seems  reasonable  to  expect  A'  |=p  c  A  a  — >  b  simply  because  c  repre¬ 
sents  an  irrelevant  proposition,  one  with  no  relation  to  our  knowledge  base  Ah 
Yet,  the  rule  c  A  a  — +  b  is  not  p-entailed  by  A.  The  reason  is  that  the  notion 
of  p-entailment  requires  P(b\a  A  c)  to  attain  arbitrarily  high  probability  in  all 
those  probability  distributions  in  which  P(b\a)  attains  arbitrarily  high  probabil¬ 
ity.  A  probability  distribution  P'  where  P'{b\a)  >  1  —  e  (for  every  e  >  0)  and 
yet  P'(b\a  A  c)  =  0  can  be  easily  built.13  For  this  reason,  p-entailment  is  not 
proposed  as  a  complete  characterization  of  defeasible  reasoning.  It  nevertheless 
yields  a  core  of  plausible  consequences  that  should  be  maintained  in  every  system 
that  reasons  defeasibly  [97,  98].  Similar  problems  with  irrelevance  are  shared  by 
all  other  proposals  for  nonmonotonic  reasoning  based  on  a  conditional  interpre¬ 
tation  of  the  rules  (see,  for  example,  [23,  69,  14,  36]).  Extensions  of  p-entailment 
presenting  solutions  to  these  problems  will  be  explored  in  Chapters  3,  4,  and  5. 14 
Nonprobabilistic  extensions  can  be  found  in  [72,  36,  14].  All  these  formalisms, 
as  well  as  circumscription  [88],  default  logic  [108],  and  argument- based  systems 
[84,  58],  could  benefit  from  a  preliminary  test  of  consistency  such  as  the  one 
proposed  in  this  chapter. 


13This  is  not  surprising  since  a  —*  b  does  not  say  much  about  the  worlds  for  c  or  —>e. 
14Chapters  3  and  4  are  independent  of  each  other  and  can  be  read  in  any  order. 
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CHAPTER  3 


Plausibility  I:  A  Maximum  Entropy  Approach 

3.1  Introduction 

This  chapter  proposes  an  approach  to  nonmonotonic  reasoning  that  combines  the 
principle  of  infinitesimal  probabilities  (described  in  Chap.  2)  with  the  principle 
of  maximum  entropy  in  order  to  extend  the  inferential  power  of  the  probabilistic 
interpretation  of  defaults.  As  pointed  out  in  Sections  1.1  and  1.2,  conditional 
based  approaches  (such  as  p-entailment)  fail  to  sanction  some  desirable  patterns 
which  are  readily  sanctioned  in  common  discourse.  Their  main  weakness  stems 
from  the  failure  to  properly  handle  irrelevant  information  (see  Sec.  2.7  and  [101, 
38]).  Recent  extensions  (Delgrande’s  proposal  [23],  rational  closure  [72],  and 
system- Z  [100])  to  the  conditional  based  approaches  were  successful  in  capturing 
some  aspects  of  irrelevance,  but  still  suffer  from  the  ills  of  conservativism,  namely, 
they  fail  to  sanction  property  inheritance  from  classes  to  exceptional  subclasses. 
For  example,  given  that  “birds  fly”,  “birds  have  beaks”,  and  “penguin-birds  don’t 
fly”,  these  extensions  fail  to  conclude  that  penguin-birds  have  beaks  (despite 
their  being  exceptional  relative  to  flying).  The  maximum  entropy  formalism 
described  in  this  chapter  is  proposed  as  a  well-disciplined  approach  to  extracting 
implicit  (probabilistic  based)  independencies  from  fragments  of  knowledge,  so 
as  to  overcome  those  ills.  In  this  respect,  the  resulting  formalism  combines  the 
virtues  of  both  the  extensional  and  the  conditional  based  approaches  (see  Sec.  1.2 
for  a  review  of  both  approaches  to  default  reasoning). 

The  connection  between  maximizing  entropy  and  minimizing  dependencies 
has  been  recognized  by  several  workers  [63,  123]  and  was  proposed  for  default 
reasoning  by  Pearl  [97,  p.  491].  The  origin  of  this  connection  lies  in  statistical 
mechanics,  where  the  entropy  approximates  the  (logarithm  of)  the  number  of  dis¬ 
tinct  configurations  (assignments  of  properties  to  individuals)  that  comply  with 
certain  constraints  [12],  For  example,  if  we  observe  that  in  a  certain  population 
the  proportion  of  tall  individuals  is  p  and  the  proportion  of  smart  individuals  is  q , 
then  out  of  all  configurations  that  comply  with  these  observations,  those  in  which 
the  proportion  of  smart- and-tall  individuals  is  pq  (as  dictated  by  the  assumption 
of  independence)  constitute  the  greatest  majority;  any  other  proportion  of  smart- 
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and-tall  individuals  would  permit  the  realization  of  fewer  configurations,  and  will 
correspond  to  a  lower  entropy  value. 

In  physics,  a  configuration  stands  for  the  assignment  of  each  particle  to  a  par¬ 
ticular  cell  in  position-momentum  space,  and  all  distinguishable  configurations 
can  be  assumed  to  have  equal  a  priori  probabilities.  Therefore,  the  maximum 
entropy  distribution  also  represents  the  most  likely  distribution,  that  is,  the  dis¬ 
tribution  most  likely  to  be  found  in  nature.  Indeed,  the  celebrated  distributions 
of  Maxwell-Boltzman  and  Fermi-Dirac,  both  maximizing  the  entropy  under  the 
appropriate  assumptions,  have  been  observed  to  hold  with  remarkable  accuracy 
and  stability. 

A  similar  argument  can  be  invoked  to  justify  the  use  of  maximum  entropy 
in  reasoning  applications  where  possible  worlds  play  the  role  of  configurations, 
and  the  constraints  are  given  by  observed  statistical  proportions,  for  example, 
“90%  of  all  birds  fly”  (see  Bacchus  et  al.  [7]).  Likewise,  if  we  assume  that  default 
expressions  are  qualitative  abstractions  of  probabilities,  and  that  probabilities 
are  degrees  of  belief  that  reflect  proportions  in  an  agent’s  experience,  then  it 
is  also  reasonable  to  assume  that  our  interpretation  of  defaults,  as  manifested 
in  discourse  conventions,  is  governed  by  principles  similar  to  that  of  maximum 
entropy.  In  view  of  the  “most  likely”  status  of  the  maximum  entropy  distribution, 
it  is  quite  possible  that  discourse  conventions  have  evolved  to  conform  with  the 
maximum  entropy  principle  for  pragmatic  reasons;  conformity  with  this  principles 
assures  conformity  with  the  highest  number  of  experiences  consistent  with  the 
available  defaults. 

The  chapter  is  organized  as  follows:  Section  3.2  recasts  the  notions  behind 
p-entailment  in  terms  of  consequence  relations  and  parameterized  probability  dis¬ 
tributions  (PPD).1  This  reformulation  has  the  advantages  of  conceptual  simplicity 
and  expressiveness.  Each  PPD  will  induce  a  consequence  relation  <j>  a  on  wffs. 
A  query  such  as  “is  a  plausible  in  the  context  of  </>,  given  the  knowledge  base 
A”  will  be  then  evaluated  in  terms  of  the  set  of  consequence  relations  induced 
by  the  PPDs  admissible  with  the  constraints  in  A.  In  this  manner,  each  prob¬ 
ability  model  for  a  given  knowledge  base  can  be  characterized  and  compared  in 
terms  of  the  plausible  conclusions  it  sanctions.  By  the  same  token,  comparisons 
with  other  formalisms  a.re  also  facilitated.  It  is  shown  that  this  reformulation 
preserves  all  the  properties  of  p-entailment  (see,  for  example,  Thm.  3.10),  and 
that  consequence  relations  are  enhanced  with  a  desirable  property  called  Rational 

1This  special  set  of  distributions  present  a  smoothness  property  necessary  for  the  computa¬ 
tion  of  the  maximum  entropy  distribution  (see  Def.  3.1). 
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Monotony.2  Section  3.3  is  concerned  with  the  symbolic  machinery  necessary  to 
compute  the  consequence  relation  associated  with  the  maximum  entropy  distribu¬ 
tion,  and  develops  such  machinery  for  a  class  of  default  rules  called  minimal-core 
sets.  Section  3.4  gives  examples  illustrating  the  behavior  of  the  consequence 
relations  that  results  from  maximizing  entropy.  Section  3.5  discusses  issues  re¬ 
lated  to  non-minimal-core  sets,  and  Section  3.6  summarizes  the  main  results. 

A  step-by-step  application  of  the  Lagrange  multipliers  technique  can  be  found 
in  Appendix  B,  while  Appendix  A  contains  proofs  of  the  main  theorems  and 
propositions. 

3.2  Parameterized  Probability  Distributions 

The  definition  below  restricts  the  set  of  acceptable  probability  distributions  to 
those  that  present  an  analytic  property  around  £  =  0.  This  restriction  is  con¬ 
venient  for  computing  maximum  entropy  and  for  introducing  the  concepts  of 
consequence  relations  (see  Def.  3.4)  and  rankings  (see  Def.  3.7).  As  is  demon¬ 
strated  in  Theorems  3.3  and  3.10,  this  restriction  does  not  affect  the  notions  of 
p-consistency  and  p-entailment  introduced  in  Chapter  2. 

Definition  3.1  (Parameterized  probability  distribuion)  A  param  eterized  prob¬ 
ability  distribution  (PPD)  is  a  collection  {P£}  of  probability  measures  over  the 
set  of  possible  worlds,  indexed  by  a  parameter  e.  {P£}  assigns  to  each  possible 
world  u>  a  function  of  e,  P£(cc),  such  that: 

1.  P£( u)  >  0  for  all  e  >  0,  and 

J2  =  1  for  all  £  >  0.  (3.1) 


2.  For  every  w,  P£( uj)  is  analytic  at  e  =  0.  In  other  words,  PPD’s  can  be 
expanded  as  a  Taylor  series  about  zero. 


□ 


For  each  formula  9 0  6  £,  P£(<p)  is  defined  as 

p.M  =  E  AM, 

u>\=<p 

2Ginsberg  [44]  however,  argues  against  Rational  Monotony. 


(3.2) 
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and  for  each  f>  and  <p  in  £,  the  conditional  probability  Pe(f\p)  is  defined  as 


PM  Iv5)  = 


1  if  PM)  =  o 

otherwise 

PtW) 


(3.3) 


For  simplicity  P£  will  be  used  when  referring  to  the  PPD  (strictly,  {Pe})  in  the 
remainder  of  the  chapter,  wherever  the  distinction  is  clear  from  the  context. 


Definition  3.2  (Consistency)  A  given  P£  is  admissible  with  respect  to  a  set 
D  iff  for  each  pt  — >  fi  £  D, 

lim  PMMi)  =  1  (3.4) 

and  Pe(<pi )  >  0.  D  is  said  to  be  consistent  iff  there  exists  at  least  one  admissible 
P£  with  respect  to  D.3 
□ 


Theorem  3.3  A  set  D  is  consistent  iff  D  is  p-consistent. 

It  follows  that  procedure  TestXonsistency  outlined  in  Figure  2.1  can  be  used  to 
check  consistency  as  defined  in  Definition  3.2.  The  definitions  above  provide 
a  semantical  interpretation  for  each  default  rule  in  D  in  terms  of  infinitesimal 
conditional  probabilities,  where  f  accrues  an  arbitrarily  high  likelihood  whenever 
ip  is  all  that  is  known. 

As  Theorem  3.10  will  show,  Defs.  3.1  and  3.4  recast  the  notions  of  p-entailment 
in  terms  of  consequence  relations.  The  study  of  nonmonotonic  and  default  rea¬ 
soning  in  terms  of  consequence  relations  was  first  proposed  by  Gabbay  [32]  and 
further  explored  in  [85,  69,  72], 

Definition  3.4  (Consequence  relations)  Every  P£  induces  a  unique  conse¬ 
quence  relation  on  formulas,  defined  as  follows: 

<t>  |~  (J  iff  limP£(crj<^)  =  1  (3.5) 

p  f  f  is  proper  in  Pe  if  Pefp)  >  0  for  all  e.  The  set  of  proper  will  be 

called  the  proper  consequence  relation  of  Pe. 

□ 

3Note  that,  we  are  only  dealing  with  defeasible  rules.  The  generalization  to  strict  rules 
follows  immediately  from  the  concepts  introduced  in  Chapter  2.  We  just  need  to  augment  the 
conditions  for  admissibility  in  Def.  3.2  to  include  the  requirement  that  for  each  <pj  =>  cq  €  S, 
PeWMj)  =  I- 
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Note  that  a  given  Pe  is  admissible  with  respect  to  a  set  D  iff  for  each  pi  — >  ifx  €  D, 
Pi  |~  ipi  belongs  to  the  proper  consequence  relation  of  P€. 

Among  the  general  rules  of  inference  that  a  commonsense  consequence  rela¬ 
tion  might  be  expected  to  obey,  the  following  have  been  proposed  [37,  69,  85] :4 

(Logic)  liphif,  then  p  |~  ip. 

(Cumulativity)  If  p  |~  if,  then  p  |~  7  iff  p  A  if  7. 

(Cases)  If  p  |~  t/>  and  7  |~  ip,  then  p  V  7  |~  if. 

Kraus  et  al.  [69]  introduce  a  class  of  preferential  models  and  show  that  each 
preferential  model  satisfies  the  three  laws  given  above.  Moreover,  they  show 
that  every  consequence  relation  satisfying  those  laws  can  be  represented  as  a 
preferential  model.5  Analogous  results  were  shown  in  [72]  with  respect  to  the 
class  of  ranked  preferential  models  and  the  set  of  rules  above  augmented  by  the 
rule  of  rational  monotony: 

If  p  |~  if  and  p  [A  _i7,  then  p  A  7  |~  if.  (3-6) 

Theorems  3.5  and  3.6  formalize  the  relation  between  a  consequence  relation  in¬ 
duced  by  a  PPD  and  a  consequence  relation  satisfying  the  rules  of  inference  above. 
In  particular,  Theorem  3.6,  which  follows  from  the  results  in  [74],  constitutes  a 
representation  theorem  stating  that  any  finite  consequence  relation  that  satisfies 
logic,  cumulativity,  cases,  and  rational  monotony  can  be  represented  by  a  PPD.6 

Theorem  3.5  A  PPD  consequence  relation  satisfies  the  logic,  cumulativity,  cases, 
and  rational  monotony  rules  of  inference. 

Theorem  3.6  Every  PPD  consequence  relation  can  be  represented  as  a  ranked 
preferential  model,  and  every  ranked  preferential  model  with  a  finite  non-empty 
state  space  can  be  represented  as  a  PPD  consequence  relation. 

We  remark  that  e-semantics,  as  defined  in  [37,  97]  does  not  comply  in  general 
with  the  rule  of  rational  monotony.  This  is  because  e-entailment  was  defined 
as  the  intersection  of  the  consequence  relations  induced  by  all  admissible  Pe, 

4Geffner  [37]  proposes  an  additional  rule  of  inference:  If  p  — ►  ip  €  D,  then  p\^ip.  This 
rule  establishes  a  connection  between  the  defaults  in  the  knowledge  base  and  the  consequence 
relation  [~.  Semantically,  this  connection  is  established  in  this  dissertation  by  interpretating 
defaults  in  D  as  constraints  over  rankings  (see  Defs.  3.2  and  3.4). 

5They  actually  use  a  slightly  different  set  of  inference  rules  which  can  be  shown  to  be 
equivalent  to  those  above. 

6Similar  results  have  been  obtained  independently  by  Satoh  [113]. 
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and  Pe  was  not  restricted  to  analytic  functions.  Thus,  the  set  of  admissible 
Pe  s  included  discontinuous  functions,  for  which  the  lime_»o  in  Eq.  3.5  does  not 
exist.  By  restricting  the  probability  distributions  to  be  analytic  at  e  =  0,  we 
can  guarantee  that  each  one  of  their  consequence  relations  satisfies  the  rule  of 
rational  monotony. 

As  e  approaches  zero,  the  Taylor  series  expansion  of  Pe  is  dominated  by  the 
first  term  whose  coefficient  is  non-zero.  Thus,  we  can  define  a  ranking  function 
on  possible  worlds  using  the  exponent  of  this  dominant  term  as  follows. 

Definition  3.7  (Ranking)  Given  a  P£,  the  ranking  function  Kpc(u>)  is  defined 
as 


(  min{n  such  that  lim£_o  7^  0)  if  P£( to)  >  0 
KPe(u)  =  l  e  (3.7) 

(  oo  if  Pe{u>)  —  0 

□ 

Moreover,  according  to  Eq.  3.2,  P£  also  induces  a  ranking  on  formulas  cp: 

Kpe(<p)  —  min  Kpt(u>)  (3.8) 

Proposition  3.8  The  following  are  consequences  of  Defs.  3.1,  S.f,  and  3.7. 
Given  Pe: 

1.  There  is  at  least  one  possible  world  to  such  that  n pffto)  =  0. 

2.  <j>  (~  a  holds  in  PE  iff  either  npe(f  A  a)  <  Kpfff  A  -|cr)  or  k pc(<j>)  =  oo. 

3.  We  will  say  that  KPt  is  admissible  with  respect  to  D  if  for  each  <pt  — >  fi  E  D 

Kpc{<Pi  A  iff)  <  KPc((fi  A  -i0i).  (3.9) 

D  is  consistent  iff  there  exists  at  least  one  admissible  ranking  Kpe  with 
respect  to  D. 

Expressed  in  terms  of  ranking  functions,  the  consequence  relation  induced  by 
a  PPD  echoes  the  preferential  interpretation  for  defeasible  sentences  advocated 
in  [117]  according  to  which  \j)  should  hold  in  all  minimal  (preferred,  more  normal) 
models  for  (p.  This  can  be  seen  more  clearly  by  writing  Eq.  3.9  as  n((p)  <  K(ipA~<if) 
and  recalling  that,  in  our  case,  minimality  (preference  or  normality)  is  reflected 
in  having  the  lowest  possible  ranking  (i.e.,  the  highest  possible  likelihood). 
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The  rankings  induced  by  Pe  will  prove  useful  in  the  maximum  entropy  com¬ 
putation  process.  Rather  than  computing  the  PPD  of  maximum  entropy,  P*, 
we  will  calculate  its  corresponding  ranking  k p*  (denoted  /c*),  from  which  we  can 
compute  the  consequence  relation  associated  with  P*. 

3.3  Plausible  Conclusions  and  Maximum  Entropy 

Given  that  <j>  is  true,  a  formula  a  will  be  probabilistically  entailed  by  D  if 

11  in  P£(o\(f>)  =  1 
£->0  ' 

in  all  Pe  that  are  admissible  with  respect  to  D. 

Definition  3.9  (Probabilistic  entailment)  Given  a  consistent  D,  a  formula 
a  is  probabilistically  entailed  by  D  given  </>,  written  <f)  <r,  iff  4>  |~  a  is  in  the 

proper  consequence  relation  of  all  Pe  admissible  with  D. 

□ 

As  is  expected  there  is  a  close  relation  between  p-entailment  (Def.  2.7  and  prob¬ 
abilistic  entailment: 

Theorem  3.10  Given  a  consistent  D,  (p  |~  a  iff  D  \=p  <j>  — >  a. 

It  follows  that  we  can  use  the  decision  procedure  for  p-entailment  (based  on 
procedure  Test_Consistency  in  Figure  2.1)  for  deciding  probabilistic  entailment.7 
Probabilistic  entailment  yields  (semimonotonically)  the  most  conservative  “core” 
of  plausible  conclusions  that  one  would  wish  to  draw  from  a  conditional  database 
if  one  is  committed  to  avoiding  inconsistencies  [101].  In  particular,  it  does  not 
permit  chaining  (from  a  — >  b  and  b  —>  c  conclude  a  |~  c),  or  contraposition  (from 
a  — »  b  conclude  ->b  |~  ~>a),  hence  it  is  too  weak  to  be  proposed  as  a  complete  char¬ 
acterization  of  defeasible  reasoning.  As  was  mentioned  in  Sections  1.1  and  2.7, 
the  reason  for  this  conservative  behavior  lies  in  our  insistence  that  any  conclusion 
must  attain  high  probability  in  all  probability  distributions  admissible  with  D. 
Thus,  given  D  =  {p  — >  ->/}  (“typically  penguins  don’t  fly”)  and  the  proposition 
bl  (for  “blue”),  the  conclusion  bl  A  p  ->/  (“blue  penguins  don’t  fly”)  will  not 
be  sanctioned  by  probabilistic  entailment,  since  one  admissible  distribution  re¬ 
flects  a  world  in  which  blue  penguins  do  fly.  Clearly,  if  we  want  the  system  to 

7This  notion  of  entailment  is  also  equivalent  to  the  notion  of  preferential  entailment  in  [69], 
even  though  preferential  entailment  was  motivated  by  considering  desirable  features  of  the 
consequence  relations  and  not  by  probabilistic  considerations.  The  relation  between  these  two 
notions  was  reported  by  Kraus  et  al.  [69]. 
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respect  the  communication  convention  that,  unless  stated  explicitly,  properties 
are  presumed  to  be  irrelevant  to  each  other,  we  need  to  restrict  the  family  of 
probability  distributions  relative  to  which  a  given  query  is  evaluated.  In  other 
words,  we  should  consider  only  distributions  that  minimize  dependencies,  that 
is,  they  should  contain  the  dependencies  that  are  absolutely  implied  by  D ,  but 
no  others. 

Since  the  maximum  entropy  distribution  exhibits  this  minimal  commitment 
to  dependencies,  it  well  be  the  focal  point  of  the  inference  procedure.  The  entropy 
associated  with  a  distribution  Pe  is  defined  as 

H[Pt]  =  -£P.(W)log />,(«).  (3.10) 

Q 

Given  a  set  D  =  {pi  — ►  '!/>,},  the  objective  is  to  compute  the  PPD  among  those 
satisfying  the  constraints  imposed  by  I)  that  maximizes  the  entropy;8  P*  denotes 
this  maximum  entropy  distribution.  The  formulas  in  the  consequence  relation 
of  P*  (denoted  by  |^)  are  then  taken  as  plausible  conclusions  of  D  those  .  In 
Eq.  3.5,  the  default  pt  — >  is  interpreted  as  a  constraint  on  the  limit  of  Pe{^i\(j>i) 
as  £  approaches  zero.  To  facilitate  the  maximization  of  H[Pe],  these  constraints 
are  replaced  by  equivalent  constraints  that  assign  a  specific  bound  to  Pe(fii\4>i) 
for  every  e  >  0.  The  (unique)  maximum  entropy  distribution  for  each  value  of 
£  >  0  is  then  derived,  and  finally,  its  asymptotic  solution  as  e  approaches  zero  is 
examined. 

A  PPD  Pe,  satisfies  a  rule  r\  :  pi  — >  fii  iff 

PeO’i |v°i)  >  7-77, . ,  (3-11) 

1  +  Ci£ 

where  C\  is  an  arbitrary  positive  coefficient  independent  of  e.  Accordingly,  the 
admissibility  constraints  (Eq.  3.5)  are  re-written  as: 

Ci  x  £  x  Pfifii  A  Pi)  >  Pe(->rl>i  A  Pi).  (3.12) 

As  we  shall  see,  the  equations  governing  the  ranking  approximation  will  be  inde¬ 
pendent  of  C{.  Clf  denotes  the  set  of  possible  worlds  verifying  the  rule  rt  and  (l~ 
denotes  the  set  of  possible  worlds  falsifying  the  rule  r,-,  the  constraint  of  Eq.  3.12 
can  be  written 

£  PsH  -  [Ci  X  £  x  £  Pfiu)}  <  0.  (3.13) 

Menti 

sThe  PPD  of  maximum  entropy  must  also  satisfy  the  normalization  constraint  G(w)  = 

1. 
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The  problem  is  to  maximize  Eq.  3.10  subject  to  the  constraints  given  in 
Eq.  3.13;  one  constraint  for  each  rule  in  D.  The  most  powerful  method  for  solving 
such  optimization  problems  is  the  Lagrange  multipliers  technique  [5].  This  tech¬ 
nique  associates  a  factor  a  with  each  constraint  (rule)  and  yields  a  distribution 
P*  that  is  expressible  as  a  product  of  these  factors  [18].  We  will  see  that,  under 
the  infinitesimal  approximation  (i.e.,  when  e  is  close  to  0),  P*(tu)  will  be  pro¬ 
portional  to  the  product  of  the  factors  (a)  associated  only  with  those  sentences 
falsified  in  u>. 

At  the  point  of  maximum  entropy,  the  status  of  a  constraint  such  as  Eq.  3.13 
can  be  one  of  two  types:  active,  when  the  constraint  is  satisfied  as  an  equality, 
and  passive,  when  the  constraint  is  satisfied  as  a  strict  inequality.  Passive  con¬ 
straints  do  not  affect  the  point  of  maximum  entropy  and  can  be  ignored  (see  [5]). 
Unfortunately,  the  task  of  identifying  the  set  of  active  constraints  is  in  itself  a 
hard  combinatorial  problem.  The  analysis  will  begin  by  assuming  a  set  of  active 
constraints,  then  we  will  provide  a  characterization  of  knowledge  bases  called 
minimal-core  sets  (Def.  3.11)  which  are  guaranteed  to  impose  only  active  con¬ 
straints,  and  postpone  the  discussion  of  inactive  constraints  till  Section  3.5. 

An  application  of  the  Lagrange  multiplier  technique  on  a  set  of  n  active  con¬ 
straints  yields  the  following  expression  for  each  term  P£( u)  (see  Appendix  B  for 
a  step  by  step  derivation):9 

Pe{oj)  =  cv0  x  aTt  x  [Q  (3.14) 

r,£Du  r.jeDi 

where  D~  denotes  the  set  of  rules  falsified  in  to  and  D+,  denotes  the  set  of  rules 
verified  in  lo.  Motivated  by  Def.  3.4,  we  look  for  an  asymptotic  solution  where 
each  ari  is  proportional  to  £Z(r>)  for  some  non-negative  integer  Z(r,-),10  and  thus 
each  term  of  the  form  aij  will  tend  to  one  as  e  tends  to  zero.  The  term  c*o 
is  a  normalization  constant  that  will  be  present  in  each  term  of  the  distribution 
and  thus  can  be  safely  ignored.  Using  P£'  to  denote  the  unnormalized  probability 
function,  and  taking  the  limit  as  e  goes  to  zero,  Eq.  3.14  yields: 

f  1  D-=ty 

K M  *  (3-15) 

1  I! neD~  ar,  otherwise 

9In  Eq.  3.14,  ao  =  AA°-1)  and  a,.k  =  eXk ,  where  Ao  and  Aj,  are  the  actual  Lagrange 
multipliers. 

10We  take  a  “bootstrapping”  approach:  if  this  assumption  yields  a  solution,  then  the  unique¬ 
ness  of  P*  will  justify  this  assumption.  Note  that  the  assumption  will  be  satisfied  for  analytic 
functions  (see  Def.  3.1)  which  eliminate  exponential  dependencies  on  e. 


44 


Thus,  the  probability  of  a  given  possible  world  lo  depends  only  on  the  rules  that 
are  falsified  by  that  possible  world.  In  other  words,  any  two  possible  worlds 
that  falsify  the  same  set  of  rules  are  (asymptotically)  equiprobable.  Once  the 
a-factors  are  computed,  we  can  construct  an  asymptotic  approximation  of  the 
desired  probability  distribution  (using  the  ranking  functions  of  Def.  3.7)  and 
determine  the  consequence  relation  of  D,  according  to  maximum  entropy. 

To  compute  the  a-factors  we  substitute  the  expression  for  each  P'e(u)  (Eq.  3.15) 
in  each  of  the  active  constraint  equations  (Eq.  3.13),  obtaining: 

E  [  II  an\  =  Ctex  E  [  II  (3.16) 

rkeDZ  Mefijf  rjeDZ 

A  few  observations  are  in  order.  First,  Eq.  3.16  constitutes  a  system  of  n  equations 
(one  for  each  active  rule)  with  n  unknowns  (the  o-factors;  one  for  each  active 
rule).  Second,  since  PE  is  analytic  at  e  =  0,  we  can  extract  the  dominant  term 
in  the  Taylor  expansion  of  each  element  in  the  products  of  Eq.  3.16  and  write 
ari  «  at£Z(ri )  where  Z(r8-)  is  a  non-negative  integer.  Our  task  reduces  to  that 
of  computing  the  Z' s.11  Third,  we  can  replace  each  summation  by  its  dominant 
term,  namely,  the  term  where  e  has  the  minimal  exponent.  Thus,  taking  log  on 
both  sides  of  Eq.  3.16  and  writing  log  ari  «  log  a;  +  Z{i'i)  log  e  Z{r{)  log  £  yields 

min[  V)  Z{rk)]  =  1  +  min[  V'  Z(rj)\  1  <  i  <  n,  (3-17) 

n~  n+ 

*  rkeDv  •  rjeDu 

where  the  minimization  is  understood  to  range  over  all  possible  worlds  in  fl~  and 
0+,  respectively.  Note  that  the  constants  Ct  do  not  appear  in  Eq.  3.17;  they 
interact  only  with  the  a,-  coefficients  which  will  be  adjusted  accordingly  to  match 
the  constraints  in  Eq.  3.12. 

Since  rule  r,  is  falsified  in  each  possible  world  on  the  left-hand  side  of  Eq.  3.17, 
Z(vi)  will  appear  in  each  one  of  the  £j-terms  inside  the  min  operation  and  can 
be  isolated: 


Z(ri)  +  mm[  E  z(rk)}  =  1  +  mm[  E  Z(ri)]-  (3.18) 

n,“  _  n  t 

*  rk<=Du  r*  rj€Dw 

k^i 

Although  Eq.  3.18  offers  a  significant  simplification  over  Eq.  3.16,  it  is  still 
not  clear  how  to  compute  the  values  for  the  Z's  in  the  most  general  case.  We  now 
introduce  a  class  of  rule  sets  D  for  which  a  simple  greedy  strategy  (see  procedure 
Z*_ order  in  Figure  3.1)  can  be  used  to  solve  the  set  of  equations  above. 

nEach  probability  term  Pl(to)  is  asymptotically  determined  once  the  values  of  the  Z’s  are 
computed  (see  Eq.  3.15). 
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Definition  3.11  (Minimal-Core  set)  D  is  a  minimal-core  (MC)  set  iff  for 
each  rule  jq  :  pi  —>  0t-  £  D,  its  negation  pi  — >  -i 0;  is  tolerated  by  D  —  {pi  — »  0,}. 
Equivalently,  for  each  rule  r,-  there  is  a  possible  world  that  falsifies  r,-  and  no  other 
rule  in  D. 

□ 

For  example,  the  set  Dp  of  Example  2.2  is  MC:  uq  |=  p  A  b  A  /  falsifies  p  — >  -i/, 
A  6  A  ->/  falsifies  b  —>■  /,  and  0J3  |=  p  A  -16  A  ->/  falsifies  p  b.  On 
the  other  hand,  changing  p  — $■  ->f  to  p  — »  /  renders  no  longer  MC.  Clearly, 
deciding  whether  a  set  D  is  MC  takes  |D|  satisfiability  tests.  The  MC  property 
excludes  sets  D  that  contain  redundant  rules,  that  is,  rules  r  that  are  entailed  by 
the  P*  computed  for  D  —  {r}.  It  follows  that  all  default  rules  in  an  MC  set  are 
active. 

Proposition  3.12  If  D  is  an  MC  set ,  then,  for  all  defaults  r  :  <p  if  £  D, 
ip  0  is  not  in  the  consequence  relation  induced  by  D  —  {<p  — »  0}. 

The  MC  property  guarantees  that,  for  each  rule  rt-  £  D ,  there  is  a  possible  world 
Ui  in  which  only  that  rule  is  falsified.  Thus,  since  the  min  operation  on  the  left- 
hand  side  of  Eq.  3.18  ranges  over  all  possible  worlds  lo  in  which  r,-  is  falsified,  the 
minimum  of  such  possible  worlds  is  w,-,  and  the  constraint  equations  for  an  MC 
set  can  be  further  simplified  to 

Zirf)  =  1  +  min[  V'  Z(rj) ]  1  <  i  <  n.  (3.19) 

n+  “L 

r>  rj€Du 

Note  that  since  the  expression  Zirf)  is  equal  to  the  exponent  of  the 

most  significant  e-term  in  P*(tu),  from  Definition.  3.7  we  have  that  it  is  actually 
equal  to  &p*{ lo),  which  we  denote  by  «*.  Thus,  Eq.  3.19  can  be  rewritten  as  a 
pair  of  coupled  equations;  the  first, 

«»=  E  z*{n),  (3.20) 

ri€Du 

assigns  a  ranking  to  each  possible  world  u  once  we  know  the  Z*-values  on 

the  rules.  The  second, 

Z*(ri)  =  min  k*(lo)  +  1,  (3.21) 

Jvf  p<pi 

assigns  a  value  Z*{rf)  to  each  rule  r*  :  pi  ifi  once  we  know  the  possible  world 
ranking  n*.  We  have  reduced  the  computation  of  the  maximum  entropy  distri¬ 
bution  to  finding  a  Z *  function  that  is  consistent  with  both  Ecjs.  3.20  and  3.21. 
Moreover,  given  the  «*-ranking,  we  can  decide  entailment  by  the  criterion 

p  0  iff  k*(0  A  p)  <  k*(-«0  A  p).  (3.22) 
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Procedure  Z*_order 

Input:  A  consistent  MC  set  D.  Output:  Z*-ranking  on  rules. 

1.  Let  D0  be  the  set  of  rules  tolerated  by  D. 

2.  For  each  rule  r\  :  </?;  — >  ipi  6  D0,  set  Z{rf)  =  1  and  7 ZZ+  =  D0. 

3.  While  KZ+  ±  D,  do: 

(a)  Let  ft  be  the  set  of  possible  worlds  co,  such  that  to  falsifies  rules  only  in 
7 ZZ+  and  verifies  at  least  one  rule  outside  of  lZZ+\  let  7 ZZ~  denote  the 
set  of  rules  in  7 ZZ+  falsified  by  u. 

(b)  For  each  to,  compute 


Z(ri)- 

r.enzj 


(3.23) 


(c)  Let  to*  be  the  possible  world  in  12  with  minimum  k ;  for  each  rule  rt  :  — > 

xj)i  'R.Z+  that  verifies,  compute 

Z{n)  =  +  1  (3.24) 

and  set  7 ZZ+  =  'RZ+  U  { rt } . 

End  Procedure 

Figure  3.1:  Procedure  for  computing  the  Z*-ordering  on  rules. 

The  apparent  circularity  between  k*  and  Z*  (Eqs.  3.20  and  3.21)  is  benign. 
Both  functions  can  be  computed  recursively  in  an  interleaved  fashion,  as  shown 
in  procedure  Z*_order  (Fig.  3.1). 

Theorem  3.13  Given  an  MC  set  D,  procedure  Z*_order  computes  the  function 
Z*  defined  by  Eqs.  3.20  and  3.21. 

Corollary  3.14  Given  an  MC  set  D,  the  f unction  Z*  is  unique. 

The  function  Z*  provides  an  economical  way  of  storing  the  ranking  «:*,  the  space 
requirement  is  linear  in  the  number  of  default  rules.  Still,  in  order  to  decide 
whether  if,  we  must  check  whether 

min  [  ]T  Z*(rk)]<  min  [  ]T  Z*(rj)]  (3.25) 

tu\=<fA  ip  (jj\=ipA-^ip 

rktDu,  r}eDw 
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holds,  and  the  minimization  required  by  Eq.  3.25  is  NP-hard  even  for  Horn  ex¬ 
pressions  (see  [9]). 


3.4  Examples 

Example  3.1  (Blue  penguins)  Consider  the  set  Dp  =  {p  — >  ~'f,p  — >  6,  b  — > 
/}  of  Example  2.2.  An  application  of  procedure  ZNorder  will  yield  the  ranking 
Z*\ 

z*(b->f)  =  1 

Z*(p-+b)  =  2 
Z*(p-+->f)  =  2 


Let  bl  be  a  new  proposition  denoting  the  color  “blue”.  Note  that  since  «*(</?) 
depends  solely  on  the  Z*  of  the  rules  violated  in  the  preferred  models  of  tp,  and  bl 
is  a  proposition  that  does  not  appear  in  any  of  the  defaults  in  Dp,  it  follows  that 
the  defaults  violated  in  the  preferred  models  of  bl  Ap  A  ~>f  (“blue  penguins  don’t 
fly”)  are  the  same  a.s  those  violated  in  the  preferred  possible  worlds  of  p  A  — >/. 
We  have: 

K*(bl  A  p  A  ->f)  =  min  V)  Z*(r,-)  =  K*(p  A  -if)  =  1 

m\=blApA-if  _ 

ri£DpM 

K*(bl  Ap  A  /)  =  min  JZ  Z*{rf)  =  K*(p  A  /)  =  2 

raf=6/ApA/ 

Thus,  «*(W  ApA  ->/)  <  «*(W  ApA  /),  which  ratifies  the  conclusion  bl  A  p  fy  ->/ 
(“blue-penguins  don’t  fly”).  This  conclusion  follows  directly  from  the  rule  of  ra¬ 
tional  monotony  (Eq.  3.6):  Given  that  p  —>  -if  £  DP,  p  ->/  by  the  requirement 
of  admissibility.  Now,  since  any  model  uj  |=  p  A  bl  falsifies  exactly  the  same  rules 
as  any  model  u/  (=  p  A  ->6/,  «*(p  A  6/)  =  «*(p  A  ->bl)  and  p  \f^*  ->bl.  Thus,  by 
rational  monotony  p  A  bl  -if.  In  general,  we  have  the  following  proposition: 

Proposition  3.15  Let  D  be  a  set  of  defaults,  and  let  p  be  a  proposition  not 
appearing  in  any  of  the  defaults  in  D;  then 

p  A  <j>\?  a  iff  a. 

Example  3.2  (Inheritance)  In  this  example,  we  consider  whether  penguins, 
despite  being  an  exceptional  class  of  birds  (with  respect  to  flying),  can  inherit 
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other  properties  typical  of  birds.  Consider  the  set  Dw  =  Dp  U  {b  — >  w }.12 
The  ranking  Z*  remains  identical  to  the  one  in  Ex.  3.1  except  for  the  new  rule 
Z*(b  —*  w)  =  1.  Note  that  since  n*(p  Aw)  =  1  (in  the  most  preferred  model  for 
p  Aw,  the  default  b  — >  /  is  violated),  while  K*(p  A  ->w)  —  2  (either  both  b  — >  / 
and  6  — »  w  or  only  p  — >  b  are  violated  in  the  most  preferred  models  for  p  A  -ut>), 
we  conclude  p  w  (i.e.,  “penguins  have  wings”). 

It  is  instructive  to  compare  the  behavior  of  with  the  behavior  of  the  ra¬ 
tional  closure  of  Lehmann  [72]  or,  equivalently,  system- Z  [100].  System- Z  selects 
the  probability  function  Pf  (among  those  admissible  with  D )  that  assigns  to  each 
world  the  highest  possible  probability  (i.e.,  lowest  rank)  consistent  with  D.13  En- 
tailment  is  then  defined  in  terms  of  the  consequence  relation  induced  by  P*.  A 
parallel  can  be  drawn  with  the  rational  closure  in  terms  of  ranked-models.  Ratio¬ 
nal  closure  also  attempts  to  correct  the  conservative  behavior  of  the  preferential 
entailment  [69]  (which  is  essentially  equivalent  to  probabilistic  entailment)  by 
restricting  the  set  of  consequence  relations  that  define  entailment.  The  resulting 
ranked-model  and  notion  of  entailment  present  the  same  properties  that  Pf  and 
its  consequence  relation  (this  equivalence  is  formally  shown  in  [48]).  Given  Dw 
(Ex.  3.2),  p  |~  u>  will  not  follow  from  the  rational  closure  of  Dw:  Once  penguins 
are  found  to  be  exceptional  birds  with  respect  to  flying,  the  consequence  relation 
in  the  rational  closure  will  regard  penguins  exceptional  with  respect  to  any  other 
property  of  birds.  The  source  of  this  counterintuitive  behavior  can  be  traced  to 
the  ranking  function  k+  (dissociated  with  Pf )  that  sanctions  this  consequence 
relation  defined  by  a  pair  of  equations  similar  to  Eqs.  3.20  and  3.21  [48,  100,  50]: 

k+( u>)  =  max  Z+(r{)  (3.26) 

ri€D~ 

where 

Z+(ri)  =  min  k+(lo)  (3.27) 

Ai/j, 

Whereas  the  maximum  entropy  approach  uses  and  tries  to  minimize  the 

weighted  sum  of  default  violations,  system-Z  uses  “max”  and  considers  only  the 
most  significant  default  violated.  Thus,  a  world  in  which  a  penguin  has  no  wings 
(u)'  |=  p  A  ~>w)  is  no  more  surprising  than  one  in  which  a  penguin  has  wings;  once 
the  rule  b  — »  /  is  violated,  the  additional  rule  b  — ►  w  violated  in  <J  does  not  alter 
k+ . 

The  preference  for  worlds  which  violate  less  rules  is  a  general  property  of  the 
maximum  entropy  approach,  and  is  made  precise  by  the  following  proposition: 

12To  read  “typically  birds  have  wings”. 

13A  summary  of  this  strategy  is  presented  in  Chap.  4. 
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Proposition  3.16  Let  D  be  an  MC  set,  D fj  denote  the  set  of  defaults  violated 
by  model  uj,  and  Dfj,  denote  the  set  of  defaults  violated  by  model  to'.  If  C 
Dfj,  then,  k*(lo)  <  k*(co'). 

In  other  words,  if  the  set  of  defaults  violated  by  a  model  to  is  a  proper  subset 
of  those  violated  by  another  model  t o',  then  to  is  strictly  preferable  to  c o' .  This 
coherence  property  seems  natural,  intuitive,  and  useful  in  applications  such  as 
fault  diagnosis  (see  [110]).  Suppose  we  know  that  “typically  component  p  is  not 
faulty”  ( True  — ->p)  and  “typically  component  q  is  not  faulty”  ( True  — >  ->q). 
Given  the  observation  that  p  V  q  (either  p  or  q  are  faulty),  a  reasonable  conclusion 
to  expect  is  that  either  p  or  q  is  faulty  but  not  both.  Indeed,  since  any  model 
satisfying  [p  V  q]  A  [p  A  q]  must  violate  a  superset  of  the  defaults  violated  by 
\p  V  q]  A  ->[p  A  q],  we  conclude  (p  V  q)  ty  {->p  V  ->g),  as  expected.14 

Example  3.3  (Independent  properties)  Consider  Ds  =  {s  — >  w,  s  — ■»  t} 
standing  for  “typically  Swedes  are  well-mannered”  and  “typically  Swedes  are 
tall”.  Since  there  is  no  explicit  dependency  between  being  well-mannered  and 
being  tall  in  Ds,  a  desirable  default  conclusion  from  Ds  is  -d  A  s  fjp  w  (i.e.  “short- 
Swedes  are  well-mannered”).  Again,  this  conclusion  is  sanctioned  by  [y  but  not 
by  the  rational  closure  (or  system-Z). 

The  ty  consequence  relation  is  sensitive  to  the  way  in  which  the  default  rules 
are  expressed.  For  example,  had  we  encoded  the  information  in  Ds  (Ex.  3.3) 
slightly  differently,  using  D's  =  {s  — >  (w  A  t)}  (“typically  Swedes  are  well- 
mannered  and  tall”),  we  would  no  longer  be  able  to  conclude  s  A ->t  w  (“typi¬ 
cally  Swedes  who  are  not  tall  are  well-mannered”).  This  sensitivity  to  the  format 
in  which  rules  are  expressed  seems  at  odds  with  one  of  the  basic  conventions 
of  traditional  logic,  in  which  a  — >  (b  A  c)  is  regarded  as  shorthand  for  a  — *  b 
and  a  — »  c,  and  also  stands  in  contrast  with  most  other  proposals  for  default 
reasoning  (e.g.  circumscription).  However,  this  sensitivity  might  be  useful  for 
distinguishing  fine  nuances  in  natural  discourse,  treating  w  and  t  as  two  inde¬ 
pendent  properties  if  expressed  by  two  rules  (i.e.,  “typically  Swedes  are  tall” 
and  “typically  Swedes  are  well-mannered”)  and  as  related  properties  if  expressed 
together  (i.e.,  “typically  Swedes  are  tall  and  well-mannered”). 

14This  example  is  taken  from  [83],  where  the  following  question  is  posed:  “Can  the  fact  that 
we  derive  ->p  V  -<q  from  p  V  q  when  p,  q  are  jointly  circumscribed  be  explained  in  terms  of 
probabilities  close  to  0  or  1?” . 
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3.5  Non-Minimal-Core  Sets 


In  the  maximum  entropy  distribution,  the  constraint  imposed  by  a  default  rule  in 
D  (see  Eqs.  3.12  and  3.13)  can  be  satisfied  as  either  an  equality  ( active  constraint) 
or  a  strict  inequality  ( passive  constraint).  The  MC  condition  on  D  (Def.  3.11) 
guarantees  that  all  these  constraints  are  active.  Once  we  relax  this  condition, 
not  only  does  the  process  of  finding  a  solution  for  the  set  of  Eqs.  3.18  become 
more  complex,  but  the  resulting  ranking  may  no  longer  represent  a  solution  to 
the  entropy  maximization  problem.  This  is  because  Eqs.  3.18  are  the  result 
of  applying  the  Lagrange  multipliers  technique  on  the  constraints  represented 
by  Eq.  3.12.  This  technique  finds  maxima  on  the  boundaries  defined  by  these 
constraints,  blindly  assuming  that  all  constraints  are  satisfied  as  equalities  (i.e., 
are  active,  see  Appendix  B),  whereas  all  we  are  required  to  do  is  to  satisfy  the 
constraints  of  Eq.  3.12  with  inequalities. 

Another  problem  with  using  non  MC  sets  D  is  that  of  redundant  rules,  that 
is,  rules  r  that  are  already  satisfied  by  the  consequence  relation  ^  induced  by 
D  —  {r}  (see  Prop.  3.12).  It  may  be  thought  that  these  rules  can  be  safely 
ignored  and  removed  from  the  original  knowledge  base;  however,  in  specifying  a 
particular  D,  the  user  often  intends  for  all  rules  in  D  to  play  an  active  role  in 
shaping  the  consequence  relation  f^.  Overlooking  this  intention  may  lead  to  to 
counterintuitive  results,  as  the  following  example  demonstrates.15 

Example  3.4  (Active  set.)  Consider  the  sets  Da  =  {a  —>  b,b  —■ *  c)  and  Da+  = 
Da  U  {u  — >  c}.  Note  that  Da+  is  not  an  MC  set.  If  we  run  procedure  Z* -order 
on  Da,  we  find  that  the  values  Z*(a  -a  b)  and  Z*{b  — >  c)  are  both  equal  to  1, 
and  that  a  \y  c  is  in  the  consequence  relation  induced  by  Da  since  Z*(a  — >  b )  and 
Z*(b  — »  c)  satisfy 

V  Z(rk)]  =  1  +  min[  Y,  Z,rii\  (3-28) 

D*-‘  r,iD- 

Thus,  since  the  constraint  imposed  by  the  rule  a  — »  c  in  Da+  is  satisfied  by  the 
maximum  entropy  solution  to  Da,  the  two  sets  will  have  the  same  maximum 
entropy  solution  and  the  same  consequence  relation.  Yet  these  two  sets,  Da 
and  I)a  +,  are  not  equivalent:  While  we  do  not  expect  a  A  ~^b  c  to  hold  in  Da 
(the  only  possible  “inference  path”  to  c  in  Da  goes  through  6),  we  would  like 
a  A  ->b  |~  c  to  be  in  any  reasonable  consequence  relation  induced  by  Da+  (where 
the  rule  a  c  provides  an  alternative  “path”  to  c). 

15This  example  is  a  modified  version  of  one  originally  suggested  by  Andrew  Baker  (personal 
communication) . 


51 


The  problem  rests  with  using  the  equality  Pe{c\a)  =  1  —  Cxeasa  constraint 
to  the  maximization  process,16  where  in  fact  the  constraint  intended  by  the  user 
is  stronger ,  requiring  a  faster  convergence  of  P*(c\a)  towards  1.  The  use  of 
the  Lagrange  multipliers  technique  requires  a  commitment  to  a  particular  rate  of 
convergence  for  each  rule.  Iiad  we  started  with  the  insight  that  lim^o  P(c\a)  =  1 
should  be  represented  by  P(c\a)  =  1  —  C  x  e2  instead,  the  problem  could  have 
been  solved  using  Eq.  3.18;  all  constraints  in  Da+  will  be  active,  yielding 

Z(a  — >  c)  +  min  [  53  Z(rk)}  =  2+min[53  Z(ri)i  (3-29) 

a~c  r keD-  r,eD- 

and  the  consequence  relation  ^  induced  by  Da+  will  satisfy  a  A  ^  c. 

In  general  we  could  write  Eq.  3.18  in  terms  of  slack  variables  Hi  for  each 
constraint, 


Z(r;)  +  nun[  53  Z{rk)\  =  n*  +  min[  53  Z(rj)\  (3.30) 

nr“  r ken-  r,eD~ 

k^i 

and  seek  the  correct  values  of  n*  that  would  render  every  rule  active.  Computa¬ 
tionally,  however,  guessing  the  n^s  is  not  easier  than  guessing  the  set  of  active 
constraints. 

We  see  that  the  advantages  of  MC  sets  are  twofold:  They  guarantee  conver¬ 
gence  of  procedure  Z*_order  to  a  solution  of  Eq.  3.18,  and  this  solution  represents 
a  solution  to  the  entropy  maximization  problem.  Unfortunately,  to  ensure  these 
guarantees  the  expressiveness  of  the  knowledge  bases  must  be  limited  to  MC, 
which  may  prevent  us  from  specifying  certain  rule  sets  in  the  most  natural  way. 
It  turns  out  that  the  class  of  knowledge  bases  where  these  guarantees  hold  is  in 
fact  wider  than  MC,  since  Eqs.  3.16  and  3.17  are  valid  as  long  as  all  rules  are 
active.  Consider  the  set  Dp  of  Example  2. 2, 17  augmented  with  the  rule  p  — >■  a 
(“typically  penguins  live  in  the  Antarctic”).  This  set  is  not  MC  since  any  model 
falsifying  p  — >  a  must  falsify  at  least  one  other  default  in  Dp.  Nevertheless,  all 
rules  in  Dp  U  {p  — >  a}  are  active.  Notice  that  p  — >  a  is  insensitive  to  the  /Lvalues 
associated  with  each  of  the  rules  in  Dp ,  and  vice  versa.  In  other  words,  any 
change  on  the  value  of  the  Z  associated  with  a  rule  in  Dp  will  not  affect  the  Z 
associated  with  p  — >  a.18  Thus,  we  can  compute  the  Z-values  for  Dp  and  p  — >  a 

16Note  that  Pc(c\a)  >  1  —  C  x  e  is  equivalent  to  C'  x  e  x  P£(cAa)  >  Pe(->cA  a)  for  expressing 
lime_o  P(c|a)  =  1  as  a  constraint  for  the  maximization  process. 

1 '  Recall  that  Dp  =  {p  — +  — >/,  p  — >  6,6  — *  /}. 

18The  instance  of  Eq.  3.18  for  p  —>  a  shows  that  the  -min- terms  on  both  sides  are  identical 
and  therefore  can  be  canceled. 
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separately  using  two  independent  applications  of  procedure  Z*_order,  one  on  Dp 
and  the  other  on  p  — >■  a,  and  then  combine  the  resulting  Z's.  We  see  that  cer¬ 
tain  topological  features  of  D  permit  its  decomposition  into  a  set  of  components 
belonging  to  MC,  hence  every  rule  will  be  active.  The  full  characterization  of 
databases  in  which  all  rules  are  active  remains  an  open  problem. 

3.6  Discussion 

Maximum  entropy  can  be  viewed  as  an  extension  of  both  p-entailment  and  the  ra¬ 
tional  closure.  Like  p-entailment,  it  is  based  on  infinitesimal  probability  analysis; 
and  like  rational  closure,  it  is  based  on  a  unique  ranking  of  possible  worlds  sub¬ 
ject  to  constraints.  In  the  rational  closure,  however,  possible  worlds  are  given  the 
lowest  rank  that  is  consistently  possible,  and  hence  the  rank  of  a  model  is  equal 
to  the  rank  of  the  most  crucial  rule  violated  by  that  model.19  In  contrast,  maxi¬ 
mum  entropy  ranks  models  according  to  the  weighted  sum  of  rule  violations,  and 
it  is  this  difference  that  enables  maximum  entropy  to  sanction  inheritance  across 
exceptional  subclasses,  concluding  that  “penguins  have  wings”  in  Example  3.2 
and  that  “short  Swedes  are  well-mannered”  in  Example  3.3. 

The  ranking  of  rules  in  maximum  entropy  is  reminiscent  of  abnormality  pref¬ 
erences  in  prioritized  circumscription  [88],  with  the  difference  that  the  priorities 
assigned  by  maximum  entropy  are  extracted  automatically  from  the  knowledge 
base  and  do  not  need  the  intervention  of  the  user.  This  property  is  shared  by 
Geffner’s  [36]  conditional  entailment ,  which  also  combines  the  virtues  of  the  exten- 
sional  and  the  conditional  approaches.  Conditional  entailment  maintains  partial 
orders  among  rule  priorities  and  among  models,  and  it  produces  more  acceptable 
inferences  than  maximum  entropy  in  certain  cases  (see  [36]),  at  the  expense  of 
a  greater  computational  complexity  and  the  loss  of  the  underlying  probabilistic 
semantics. 

A  weakness  of  the  maximum  entropy  approach  is  that  it  stands  at  odds  with 
the  principle  of  causation.20  If  we  first  compute  the  maximum  entropy  distri¬ 
bution  P*(X i,...,Xn)  on  a  set  of  variables  Ah , . . . ,  Xn  and  then  consider  one 
of  their  consequences  Y ,  we  may  find  that  the  maximum  entropy  distribution 
P*(Xi, . . . ,  Xn,  Y)  constrained  by  the  conditional  probability  P(Y\X\, . . .  ,Xn) 
changes  the  probabilities  of  the  X  variables.  For  example,  specifying  the  biases 

19This  ranking  is  called  Z- rank  in  [100]. 

20This  weakness,  shared  by  many  proposals  for  nonmonotonic  reasoning  [57],  has  required 
the  introduction  of  special  causal  operators  [116,  36].  Chapter  5  proposes  a  different  approach 
to  causality  based  on  probabilistic  considerations  of  independence. 
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of  two  coins  yields  a  maximum  entropy  distribution  in  which  the  two  coins  are 
mutually  independent.  However,  further  specifying  the  probability  with  which 
an  observer  would  respond  to  each  of  the  four  outcomes  of  these  coins  might 
yield  a  maximum  entropy  distribution  in  which  the  two  coins  are  no  longer  inde¬ 
pendent  of  each  other  (see  [97,  pp.  463,  519],  and  [60]). 21  This  stands  contrary 
to  our  conception  of  causality  in  which  the  forecasting  of  future  events  does  not 
alter  beliefs  about  past  events.  This  behavior  prevents  the  maximum  entropy 
approach  from  properly  handling  tasks  such  as  the  Yale  shooting  problem  [57], 
where  rules  of  causal  character  are  given  priority  over  other  rules.  Such  priorities 
can  be  introduced  in  the  /t-ranking  system  using  a  device  called  stratification  (see 
Chap.  5),  which  forces  k  to  obey  the  so-called  Markov  condition:  Knowing  the 
present  renders  the  future  independent  of  the  past.  The  role  of  maximum  entropy 
in  stratified  rankings  could  then  be  to  define  preferred  rankings  under  incomplete 
specification  of  causal  influences:  Out  of  all  admissible  rankings  that  conform  to 
the  stratification  condition,  choose  those  with  the  maximum  entropy. 

The  maximum  entropy  formalism  can  be  extended  to  admit  defaults  with 
variable  strengths;  each  default  in  D  can  be  annotated  with  a  parameter  <5;,  that 
indicates  the  firmness  with  which  the  default  is  believed.22  Probabilistically,  Si 
represents  the  slowest  rate  at  which  P(ipi\(fi)  should  be  allowed  to  approach  one  as 
e  approaches  zero.  The  constraints  to  the  maximization  process  can  be  modified 
accordingly;  thus,  Eq.  3.12  will  now  read 


Ci  x  e6,  x  P£(fii  A  (fit)  >  P£(->Va  A  ifi), 

(3.31) 

and,  given  an  MC  set  D,  Eqs.  3.20  and  3.21  translate  to 

*»  =  £  Z'{n) 

(3.32) 

ri 

Z*(ri )  =  min  k*(  u>)  +  1  +  Si 

(3.33) 

where  the  Z*-order  for  each  rule  can  be  computed  using  procedure  ZTorder. 

This  feature  is  very  useful  in  domains  such  as  circuit  diagnosis  where  the 
analyst  may  feel  strongly  that  failures  are  more  likely  to  occur  in  one  group  of 
devices  (e.g.,  multipiers)  than  in  another  (e.g.,  adders).  For  example,  suppose 
that  in  addition  to  the  information  “typically  component  p  is  not  faulty”  and 
“typically  component  q  is  not  faulty”,  we  also  know  that  component  p  is  much 

21  Pearl  attributes  the  discovery  of  this  phenomenon  to  Norm  Dalkey  [97]. 

22An  extension  to  system-^  [100]  along  these  lines  can  be  found  in  Chapter  4  (see  also 
[50,  53]). 
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more  likely  to  fail  than  component  q.  We  can  encode  this  information  by  spec- 
ifying  D  =  { True  -4  ~^p,True  -4  q}  where  6\  <  S2.  Thus,  given  that  p  V  q 
(either  p  or  q  are  faulty)  holds,  we  conclude  that  p  is  the  faulty  component  with 
the  failure  ((p  V  q)  (y  (p  A  ~^q)).  If  entailment  is  defined  as  the  intersection  of 
the  consequence  relations  induced  by  all  sets  of  values  <5,-,  then  the  resulting 
entailment  relation  will  be  supported  by  partial  orders  among  rules  and  models, 
as  in  Geffner’s  [36]  conditional  entailment. 

Chapter  4  studies  such  an  extension  to  Pearl’s  system-Z  [100],  in  which  each 
p  —>  ip  is  annotated  with  a  positive  integer  8  denoting  the  degree  of  strength  or 
firmness  of  the  rule. 
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CHAPTER  4 


Plausibility  II:  System- Z+ 

4.1  Rankings  as  an  Order-of-Magnitude  Abstraction  of 
Probabilities 

Regardless  of  how  we  choose  to  interpret  default  statements,  it  is  generally  ac¬ 
knowledged  that  some  defaults  are  stated  with  greater  firmness  than  others.  For 
example,  the  action-response  defaults  of  the  type  “if  Fred  is  shot  with  a  loaded 
gun,  Fred  is  dead”  are  issued  with  a  greater  conviction  than  persistence  defaults 
of  the  type  “if  Fred  is  alive  at  time  t,  he  is  alive  at  t  +  1”.  Moreover,  the  degree 
of  conviction  in  this  last  statement  should  clearly  depend  on  whether  t  is  mea¬ 
sured  in  years  or  seconds.  In  diagnosis  applications,  likewise,  the  analyst  may 
feel  strongly  that  failures  are  more  likely  to  occur  in  one  type  of  device  (e.g., 
multipliers)  than  in  another  (e.g.,  adders).  A  language  must  be  devised  for  ex¬ 
pressing  this  valuable  knowledge.  Numerical  probabilities  or  degrees  of  certainty 
have  been  suggested  for  this  purpose,  but  if  the  full  precision  provided  by  numer¬ 
ical  calculi  is  not  necessary,  an  intermediate  qualitative  language  might  be  more 
suitable. 

This  chapter  proposes  such  a  language  in  terms  of  the  ranking  functions  in¬ 
troduced  in  Chapter  3  (Def.  3.7).  This  method  of  approximation  gives  rise  to  a 
semiqualitative  calculus  of  uncertainty:  Degrees  of  (dis)belief  are  ranked  by  non¬ 
negative  integers  (corresponding  perhaps  to  linguistic  quantifiers  such  as  “believ¬ 
able”,  “unlikely”,  “very  rare”,  etc);  retraction  and  restoration  of  beliefs  conforms 
to  Bayesian  conditionalization. 

One  way  of  motivating  ranking  systems  is  to  consider  a  probability  distribu¬ 
tion  P  defined  over  a  set  fl  of  possible  worlds  and  to  imagine  that  an  agent  wishes 
to  retain  an  order-of-magnitute  approximate  of  P.  The  traditional  engineering 
method  of  approximating  P  would  be  to  express  each  numerical  parameter  (spec¬ 
ifying  P)  in  a  base  b  representation,  where  b  depends  on  the  precision  needed,  and 
then  omit  all  but  the  most  significant  figure  from  each  expression.1  All  arithmetic 

1Thus,  given  a  number  n  and  a  basis  b,  its  approximate  would  be  the  polynomial  expression 
a0  *  (6)°  +  o,i  *  (6)1  +  «2  *  ( b )'J  +  . . .. 
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operations  would  then  be  performed  on  these  approximate,  single  digit  quanti¬ 
ties,  in  lieu  of  the  original  parameters.  The  abstraction  we  advocate  goes  one 
step  further.  Instead  of  retaining  the  numerical  value  of  the  most  significant  fig¬ 
ure,  we  retain  only  its  position.  The  mechanics  of  this  exercise  is  equivalent  to, 
and  can  best  be  described  by,  a  limit  process  where  quantities  are  reqresented 
as  polynomials  in  an  infinitesimal  number  e.  These  polynomials  are  added  and 
multiplied  precisely,  but  at  the  end  we  calculate  the  limit  of  the  final  results  as  e 
goes  to  zero. 

Imagine  that  the  probability  P{u)  is  a  polynomial  function  of  some  infinites¬ 
imal  parameter  £,  arbitrarily  close  to,  yet  larger  than  zero;  for  example,  P(co)  = 
1  —  C\e  or  e2  — C2£4.2  Accordingly,  the  probabilities  assigned  to  any  subset  of  0  rep¬ 
resented  by  a  logical  formula  <p,  as  well  as  all  conditional  probabilities  P(il>\tp), 
will  be  rational  functions  of  e.  We  define  the  ranking  function  1<^>)  as  the 
power  of  the  most  significant  £-term  in  the  expansion  of  P(ip\(p).  In  other  words, 
K(pj)\ip)  =  n  iff  P(i^\tp)  has  the  same  order  of  magnitude  as  en  (see  Def.  3. 7). 3 
Thus,  instead  of  measuring  probabilities  on  a  scale  from  zero  to  one,  we  can 
imagine  projecting  probability  measures  onto  a  quantized  logarithmic  scale  and 
then  treating  beliefs  that  map  onto  two  different  quanta  as  being  of  different 
orders  of  magnitude. 

The  following  properties  of  ranking  functions  (left-hand  side  below)  reflect, 
on  a  logarithmic  scale,  the  usual  properties  of  probability  functions  (right-hand 
side),  with  min  replacing  addition,  and  addition  replacing  multiplication: 


n(ip)  =  min«:(u;) 
k(<p)  =  0  or  /■>:(-></?)  =  0 

(ifi  A  if)  =  re(V’lv)  +  «(</>) 


p(v)  =  £ 

P{  <p)  +  Pir*p)  =  i 

p^Pip)  =  P(i>\<p)P(p) 


(4.1) 

(4.2) 

(4.3) 


Parameterizing  a  probability  measure  by  e  and  extracting  the  lowest  exponent 
of  £  as  the  measure  of  (dis)belief  was  proposed  in  [98]  as  a  model  of  the  process  by 
which  people  abstract  qualitative  beliefs  from  numerical  probabilities  and  accept 
them  as  tentative  truths.  For  example,  we  can  make  the  correspondence  between 
linguistic  quantifiers  and  en  depicted  in  Table  4.1  These  approximations  yield 

2  Probability  functions  parameterized  on  e  were  called  PPD’s  in  Chapter  3.  They  are  formally 
introduced  in  Definition  3.1,  where  the  e-functions  were  restricted  to  be  analytical  in  e  =  0. 
The  probability  functions  described  above  can  also  be  viewed  as  the  Taylor  approximation  of 
these  PPD’s. 

3Spohn  [120]  was  the  first,  to  study  such  ranking  functions,  which  he  named  (non- 
probabilistic)  ordinal  condition  function  (OCF).  His  main  motivation  was  to  account  for  the 
dynamics  of  plain  beliefs. 
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ha 

s 

II 

o> 

o 

<j)  a.nd  -\<j>  are  believable 

K-(f)  =  0 

P(f)  =  e1 

-xj)  is  believed 

K(f)  =  1 

P(4>)  =  £2 

if  is  strongly  believed 

n(f)  =  2 

P(<f>)  =  £3 

if  is  very  strongly  believed 

K,(f)  =  3 

; 

Table  4.1:  Linguistic  quantifiers  and  en . 

a  probabilistically  sound  calculus,  employing  integer  addition,  for  manipulating 
the  orders  of  magnitude  of  disbelief.  The  resulting  formalism  is  governed  by  the 
following  principles: 

1.  Each  world  is  ranked  by  a  non-negative  integer  k  representing  the  degree 
of  surprise  associated  with  finding  such  a  world. 

2.  Each  wff  is  given  the  rank  of  the  world  with  the  lowest  n  (most  normal 
world)  that  satisfies  that  formula. 

3.  Given  a  collection  of  facts  <f>,  we  say  that  a  follows  from  f  with  strength 
8  if  n(cr\f)  >  <S,  or,  equivalently,  if  the  k  rank  of  f  A  kj  is  at  least  8  +  1 
degrees  above  that  of  f. 

Principles  1  and  2  follow  immediately  from  the  semantics  described  above.  Prin¬ 
ciple  3  says  that  o  is  plausible  given  f  iff  P(<j\f)  >  1  —  ce5+1,  where  P  is  the 
e-parametrized  probability  associated  with  that  particular  ranking  k.  This  ab¬ 
straction  of  probabilities  matches  the  notion  of  plain  belief  in  that  it  is  deductively 
closed;4  the  drawback  of  this  abstraction  is  that  many  small  probabilities  do  not 
accumulate  into  a  strong  argument  (as  in  the  lottery  paradox). 

Reasoning  using  Principles  1  to  3  requires  the  specification  of  a  complete 
ranking  function.  In  other  words,  the  knowledge  base  must  be  sufficiently  rich 
to  define  the  k  associated  with  every  world  u>.°  Unfortunately,  in  practice,  such 
specification  is  not  readily  available.  We  are  usually  given  information  in  the  form 

4If  A  is  believed  and  B  is  believed  then,  AAB  is  believed  because  /c(— >(yl A 6))  >  0  whenever 
k(->T)  >  0  and  k(iB )  >  0.  This  deviates  from  the  threshold  conception  of  belief:  if  both  P(A) 
and  P(B)  are  above  a  certain  threshold,  P(A  A  B)  may  still  be  below  that  threshold. 

5This  is  also  the  case  with  the  OCF  described  in  Spohn  [120]. 
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of  statements  such  as  “birds  normally  fly”  which  we  interpret  as  P(f\b)  >  1  —  e 
or,  equivalently,  >  0,  and  no  information  whatsoever  about  the  flying 

habits  of  red  birds  or  non-birds.  In  this  case,  we  still  would  like  to  conclude  “red 
birds  normally  fly”,  even  though  the  information  given  merely  constraints  k  to 
satisfy  tz(f  A  b)  <  /c(->/  A  b )  (“it  is  less  surprising  to  find  a  flying  bird  than  a 
non-flying  one”),  and  is  not  sufficient  for  defining  a  complete  ranking  function. 
Drawing  plausible  conclusions  from  such  fragmentary  pieces  of  information,  re¬ 
quires  additional  inferential  machinery  to  accomplish  two  functions:  It  should 
enrich  the  specification  of  the  ranking  function  with  the  needed  information,  and 
it  should  operate  directly  on  the  specification  sentences  in  the  knowledge  base, 
rather  than  on  the  rankings  of  worlds  (which  are  too  numerous  to  list).  Such  ma¬ 
chinery  is  provided  by  a  formalism  called  system-Z+,  described  in  this  chapter, 
which  accepts  knowledge  in  the  form  of  graded  if-then  rules  and  computes  the 
plausibility  of  any  given  query  by  syntactic  manipulation  of  these  rules. 

To  accomplish  these  functions,  system-Z+  incorporates  two  principles  in  ad¬ 
dition  to  those  given  above: 

4.  Each  input  rule  “if  p  then  (with  strength  6)”,  written  p  — >  ij>,  is  in¬ 
terpreted  as  a  constraint  on  the  ranking  k,  forcing  every  world  in  p  A  ~<ip 
to  rank  at  least  6  +  1  degrees  above  the  most  normal  world  in  p ,  that  is, 
n{xl)\p)  >  8. 

5.  Out  of  all  rankings  satisfying  the  constraints  above,  we  adopt  the  rank¬ 
ing  k+  that  assigns  each  world  the  lowest  possible  (most  normal)  rank. 
Remarkably,  this  ranking  will  turn  out  to  be  unique. 

Principle  4  is  a  straightforward  generalization  of  the  probabilistic  reading  of  the 
rules,  P{xl>\p)  >  1  —  ces+1 .  The  parameter  8  is  an  optional  feature  for  the  rule 
encoder  that  augments  the  expressiveness  of  the  knowledge  base  by  assigning 
strength  to  the  rules.  If  8  is  unspecified,  it  is  assumed  to  be  equal  to  zero,  and 
rules  are  interpreted  as  P(fl\p)  >  1  —  ce.  A  knowledge  base  with  all  8  —  0 
will  be  called  a  flat,  knowledge  base.  Remarkably,  the  addition  of  the  h’s  does 
not  increase  the  computational  complexity  of  query- answering  and  consistency 
testing.6  Moreover,  we  shall  see  that  even  a  flat  knowledge  base  induces  a  natural 
priority  on  rules  in  order  to  respect  specificity  considerations  (see  Thm.  4.14). 

Principle  5  reflects  the  assumption  of  maximal  ignorance;  unless  compelled 
otherwise,  assume  every  situation  to  be  as  normal  as  possible  (or  equivalently,  no 
situation  is  more  surprising  than  necessary).  In  contrast,  the  approach  based  on 

6Both  will  require  a  polynomial  number  of  propositional  satisfiability  tests. 


59 


maximum  entropy  (Chap.  3  selects  the  ranking  k*  that  minimizes  dependencies 
among  propositions,  to  reflect  only  those  implied  by  the  rules  in  the  knowledge 
base.  As  will  be  seen,  the  advantage  of  system- Z+  is  that  the  algorithm  necessary 
for  computing  plausible  conclusions  is  more  efficient  than  the  one  for  maximum 
entropy  (Sec.  3.3).  As  in  the  case  of  maximum  entropy,  a  key  step  in  the  procedure 
is  the  computation  of  a  priority  ordering  Z+  on  the  rules  in  the  knowledge  base.7 
Section  4.3  (after  some  preliminary  definitions  in  Sec.  4.2),  introduces  a  procedure 
that  computes  Z+  in  a  polynomial  number  of  propositional  satisfiability  tests  and 
hence  is  tractable  in  applications  permitting  restricted  languages,  such  as  Horn 
expressions,  network  theories,  or  acyclic  databases.  Once  the  ordering  Z+  is 
known,  the  degree  to  which  a  given  query  is  denied  or  confirmed  can  be  computed 
in  0{log |  A|)  satisfiability  tests  (where  j  A |  is  the  number  of  rules  in  the  knowledge 
base  A).  On  the  other  hand,  as  shown  in  Section  4.7  and  partially  discussed  in 
Sections  3.4  and  3.6,  this  computational  advantage  of  system-Z+  over  maximum 
entropy,  results  in  weakening  the  set  of  conclusions  ratified  by  the  system. 

In  Section  4.5,  system-Z+  is  equipped  with  the  capability  to  reason  with  soft 
evidence  or  imprecise  observations.  Such  a  capability  is  important  when  we  wish 
to  assess  the  plausibility  of  a  (using  Prin.  3  above)  but  the  context  (j>  is  not 
given  with  absolute  certainty;  all  that  can  be  ascertained  is  “<^>  is  supported  to 
a  degree  n”.  We  propose  two  different  strategies  for  computing  a  new  ranking 
k'  from  an  initial  one  />: ,  given  soft  evidential  report  supporting  a  wfF  <j>.  The 
first  strategy,  named  J-condilionalization ,  is  based  on  Jeffrey’s  rule  of  condition¬ 
ing  [99].  It  interprets  the  report  as  specifying  that  “all  things  considered”,  the 
new  degree  of  disbelief  for  -></;  should  be  =  n.  The  second  strategy,  named 

L-conditionalization ,  is  based  on  the  virtual  evidence  proposal  described  in  [97, 
Chap.  2].  It  interprets  the  report  as  specifying  the  desired  shift,  in  the  degree  of 
belief  in  <f>,  as  warranted  by  that  report  alone  and  “nothing  else  considered”.  We 
show  that  both  J  and  L-conditionalization  have  roughly  the  same  complexity  as 
ordinary  conditionalization.  Section  4.6  relates  and  compares  system- Z+  to  the 
theory  of  belief  revision  in  [3].  Finally,  Section  4.7  summarizes  the  main  results. 

4.2  Preliminary  Definitions:  Rankings  Revisited 

Consider  a  set  A  =  { rt  \  rt  =  ^4  ifi,  1  <  i  <  ?r},  where  y;  and  ipt  are 

propositional  formulas,  “— »”  denotes  a  default  connective  as  before,  and  is 

‘This  priorities  should  be  distinguished  from  the  strengths  <5’s  that  are  assigned  to  the  rules 
by  their  author;  priorities  represent  the  interactions  among  the  rules  and  reflect  considerations 
such  as  specificity  and  relevance  which  are  applicable  to  systems  with  flat  knowledge  bases. 
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a  non- negative  integer  representing  the  degree  of  strength  of  rule  r;.8  Ranking 
functions  are  defined  as  follows: 

Definition  4.1  (Ranking)  A  ranking  function  k  is  an  assignment  of  non-negative 
integers  to  the  elements  in  0,  such  that  k(co)  =  0  for  at  least  one  to  £  fi. 

□ 

We  extend  this  definition  to  induce  rankings  on  wffs  in  accordance  with  the 
probabilistic  interpretation  of  Eq.  4.1: 

min^Uu  k(lo)  if  p  is  satisfiable 

*(¥>)  =  _  (4-4) 

oo  otherwise 

Similarly,  following  Eq.  4.3,  we  define  the  conditional  ranking  n(ip\p)  for  a  pair 
of  wffs  tp  and  ip  as 

I  n(ip  A  9?)  —  k(<p)  \i  k(p)^oo 
n{if\p>)  =  <  (4.5) 

I  oo  otherwise 

Preferences  are  associated  with  lower  ac,  and  surprise  or  abnormality  with  higher 
k.  Thus,  n(ip)  <  n{p)  if  ip  is  preferred  to  ip  in  k  or,  equivalently,  if  tp  is  more 
abnormal  (surprising)  than  ip.  Intuitively,  n(ip\p)  stands  for  the  degree  of  incre¬ 
mental  surprise  or  abnormality  associated  with  finding  ip  to  be  true,  given  that 
we  already  know  tp.  The  inequality  K(->ip \<p)  >  6  means  that  given  p  it  would  be 
surprising  (i.e.,  abnormal)  by  at  least  5+1  additional  ranks  to  find  -> ip .  Note  that 
K,(-iip\tp)  >  8  is  equivalent  to  n(p)  +  6  <  nf-'ip  A  p)  or  k (ip  A  p)  +  8  <  K{->ip  A  p), 
which  is  precisely  the  constraint  on  worlds  we  attribute  to  p  — »  tp. 

Definition  4.2  (Consistency)  A  ranking  k  is  said  to  be  admissible  relative  to 
a  given  A,  iff 

K.(pi  A  tpi)  +  Si  <  K(pi  A  -iipi)  (4.6) 

(equivalently  K{-''fi\pf)  >  Sf)  for  every  rule  pt  ipi  €  A.  A  knowledge  base  A 
is  consistent  iff  there  exists  an  admissible  ranking  k  relative  to  A.9 
□ 

8For  simplicity  we  skip  the  treatment  of  strict  rules.  The  only  necessary  change  required 
is  in  the  conditions  for  admissibility  in  Def.  4.2.  A  strict  rule  <f>  =>  a  imposes  the  following 
admissibility  constraint:  k(ct  A  <t>)  <  k{->cf  A  <j>)  and  <  oo  (see  Sec.  5.3,  Eq.  5.8). 

9 Definition  4.2  represents  the  ranking  equivalent  of  consistency  and  admissibility  in  Defini¬ 
tion  3.2  (and  Prp.  3.8)  with  a  slight  generatlization  due  to  the  new  parameter  8.  For  reasons 
of  simplicity  I  chose  not  to  introduce  a  new  term  such  as  “^-consistency” .  Also,  as  shown  in 
Theorem  4.3  both  notions  are  tested  using  the  same  procedure:  Procedure  Test_Consistency. 
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Figure  4.1:  Consistency  and  rankings. 


As  expressed  in  Section  3.2  (when  rankings  were  first  introduced,  see  Def.  3.7 
and  Prop.  3.8),  Eq.  4.6  echoes  the  usual  interpretation  of  default  rules  [117], 
according  to  which  ?/>  holds  in  all  minimal  models  for  c p.  In  our  case,  minimality 
is  reflected  in  having  the  lowest  rank,  that  is,  the  highest  possible  likelihood. 
Consistency  guarantees  that  in  every  admissible  ranking,  each  time  we  find  a 
world  u>~  violating  a  rule  tpi  -A  there  must  be  a  world  u>+  verifying  <fi  -4 
ipi  such  that  k(u>+)  must  be  at  least  Si  +  1  units  less  surprising  than  k(lo~) 
(see  Fig.  4.1).  In  probabilistic  terms,  consistency  guarantees  that  for  every  e  > 

S ' 

0,  there  exists  a  probability  distribution  P  such  that  if  ipi  -4  ipi  €  A,  then 
P(ipi\<fi)  >  1  —  C£5,+1. 

Let  A  denote  a  set  of  rules  identical  to  A  except  that  all  the  strengths  S.t  are 
removed.  Then, 

Theorem  4.3  A  set  A  is  consistent  iff  A  is  p-consistent.  (Def.  2.2). 

Thus,  consistency  is  independent  of  the  strengths  and  we  can  use  procedure 
Test_Consistency  (Fig.  2.1)  to  test  for  consistency  in  a  polynomial  number  of 
satisfiability  tests.  It  is  reassuring  that  once  a  knowledge  base  is  consistent  for 
one  set  of  ^-assignments,  it  will  be  consistent  with  respect  to  any  such  assignment, 
which  means  that  the  rule  author  has  the  freedom  to  modify  the  S' s  without  fear 
of  forming  an  inconsistent  knowledge  base. 
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4.3  Plausible  Conclusions:  The  Z+-Rank 


Given  a  set  A,  each  admissible  ranking  k  induces  a  consequence  relation  j-^, 
where  <f>  a  iff  k(ct  A.  <f>)  <  k(- <cr  A  <j>).  A  straightforward  way  to  declare  a  as  a 
plausible  conclusion  of  A  given  <j>  would  be  to  require  4>  v  in  all  k  admissible 
with  A.  However,  this  will  result  in  an  entailment  relation  equivalent  to  p- 
ent ailment  which  was  shown  to  be  too  conservative  (Chap.  3).  Thus,  similar  to 
the  approach  taken  in  Chapter  3,  we  select  a  distinguished  admissible  ranking, 
k+,  and  declare  a  as  a  plausible  conclusion  of  A  given  <f>,  written  <j>  |~  7,  iff 
A  cr)  <  A  -icr).10 

The  ranking  k+  assigns  to  each  world  the  lowest  possible  rank  permitted 
by  the  admissibility  constraints  of  Eq.  4.6.  We  will  first  introduce  a  syntactic 
definition  of  k+  and  then  show  that  it  satisfies  the  desired  minimality  condition. 

Definition  4.4  (The  ranking  k+)  Let  A  =  {?',■  |  rt-  =  <p.;  A  ?/>*}  be  a  consistent 
set  of  rules.  k+  is  defined  as  follows: 

{0  if  1 0  does  not  falsify  any  rule  in  A 

(4.7) 

max^j =v,A^,[Z+(ri)\  +  1  otherwise 
where  Z+(rt  )  is  a  priority  ordering  on  rules,  defined  by 

Z+(r,)  =  min  [k+(o;)]  +  6,-.  (4.8) 

w\=<Pi  Kipi 

a 

Eqs.  4.7  and  4.8  can  be  viewed  as  two  coupled  equations;  one  defines  /c+  in  terms 
of  Z+,  the  second  defines  Z+  in  terms  of  /c+.  Figure  4.2  presents  an  effective 
procedure,  procedure  Z+_order,  for  computing  from  A.  The  significance  of 
Eq.  4.8  is  that  the  priorities  function  Z+  constitute  an  economical  way  of  encoding 
the  ranking  k+,  linear  in  the  size  of  A,  from  which  the  k+  of  any  world  u>  can  be 
computed  by  searching  the  highest  Z+  rule  violated  by  u  in  a  logarithmic  number 
(on  the  number  of  rules  in  A)  of  satisfiability  tests.  The  resulting  consequence 
relation  |~  and  its  associated  reasoning  procedures  are  called  system-Z+. 

We  next  show  (Thm.  4.7  and  Cor.  4.8)  that  Eqs.  4.7  and  4.8  define  a  unique 
admissible  ranking  function  k+  that  is  minimal  in  the  following  sense:  Any  other 
admissible  ranking  function  must  assign  a  higher  ranking  to  at  least  one  world  and 

10If  we  are  concerned  with  the  strength  S  with  which  the  conclusion  is  endorsed,  then  7 
iff  8  is  the  lowest  (positive)  integer  I  satisfying  A  a)  +  I  <  K+(<p  A 
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a  lower  ranking  to  none.  To  make  the  result  formal,  we  introduce  the  following 
definitions: 

Definition  4.5  (Minimal  ranking)  A  ranking  function  k  is  said  to  be  minimal 
if  every  other  admissible  ranking  k'  satisfies  k'(lo)  >  k(lo)  for  at  least  one  possible 
world  lo. 

□ 

Definition  4.6  (Compact  rankings)  An  admissible  ranking  k  is  said  to  be 
compact  if,  for  every  lo1  any  ranking  satisfying 

k'(lo)  =  k(lo)  lo  yf  to1 
k'(lo)  <  k(lo)  lo  =  lo' 

is  not  admissible. 

□ 

Theorem  4.7  Every  consistent  A  has  a  unique  compact  ranking  given  by  /c+. 

Corollary  4.8  Every  consistent  A  has  a  unique  minimal  ranking  given  by  k+  . 

Note  the  similarity  between  k+  (Eq.  4.7)  and  the  ranking  k *  associated  with 
the  maximum  entropy  approach  (reproduced  below) 

{0  if  lo  does  not  falsify  any  rule  in  A, 

(4.9) 

Ew^WA-.V’.-[^*(r0]  +  1  otherwise. 

While  /c+(co)  is  defined  by  the  maximum-priority  rule  violated  in  to,  k*(lo)  depends 
on  the  summation  of  these  priorities.  This  difference  will  have  implications  for 
both  the  computational  complexity  and  the  quality  of  conclusions  that  these  two 
proposals  sanction. 

The  computation  of  the  Z*  priorities  and  the  query- answering  procedures 
for  the  maximum  entropy  approach  has  been  proven  to  be  NP-hard  even  for 
Horn  clauses  (see  [9]).  In  contrast,  the  computation  of  Z+  using  Procedure 
Z+_order  can  be  accomplished  0(|A|2  x  log  | A | )  satisfiability  tests  (Thm  4.12). 
The  procedure  for  computing  Z+  is  presented  in  Figure  4.2,  and  is  very  similar 
to  the  one  in  Figure  3.1.  Some  of  the  steps  in  procedure  Z+ -order  invoke  a  test 
of  toleration  (Def.  2.3).  A  rule  <f>  — >  a  is  tolerated  by  A  if  the  wff  <f>  A  a  /\i  P*  A  4>i 
is  satisfiable  (where  i  ranges  over  all  rules  in  A). 

Theorem  4.9  establishes  the  correctness  of  procedure  Z+_order,  while  Lem¬ 
mas  4.10  and  4.11  and  Theorem  4.12  determine  its  (polynomial)  complexity. 
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Procedure  Z+_ order 

Input:  A  consistent  knowledge  base  A.  Output:  Z+-ranking  on  rules. 

1.  Let  A0  be  the  set  of  rules  tolerated  by  A,  and  let  7 ZZ+  be  an  empty  set. 

2.  For  each  rule  ry  =  y?t-  tpi  G  A0,  set  Z(ry)  =  Si  and  1ZZ+  =  1ZZ+  U  {?',}. 

3.  While  UZ+  ±  A,  do: 

(a)  Let  A+  be  the  set  of  rules  in  A'  =  A  —  1ZZ+  tolerated  by  A'. 

(b)  For  each  r  :  <f>  -A  a  €  A+,  let  F2r  denote  the  set  of  models  for  q>  A  a  that 
do  not  violate  any  rule  in  A';  compute 

Z{r)  =  min  [/c(uv)]  +  8  (4-10) 

UJf 

where  /c(u>r)  = 

max  {£(?';)  |  ujr  (=  cpj  A  +  1  (4.11) 

and  rj  :  ypj  -4  ^  G  TZZ+ . 

(c)  Let  r*  be  a  rule  in  A+  having  the  lowest  Z;  set  7£Z+  =  7?.Z+  U  {?’*}. 
End  Procedure 

Figure  4.2:  Procedure  for  computing  the  Z+ -ordering  on  rules. 
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Theorem  4.9  The  function  Z  computed  by  procedure  Z+_ order  satisfies  Def.  4-4, 
that  is  Z  =  Z+.n 

Lemma  4.10  Let  A  =  {r;  |  r,-  =  ifi  A  •*/>,■}  be  a  consistent  knowledge  base  in 
which  rules  are  sorted  according  a  priority  function  Z^rf).  Let  n(u>)  be  defined  as 
in  Eq.  4-L: 


k(u) 


0  if  lo  does  not  falsify  any  rule  in  A, 


(4.12) 


[  +  1  otherwise. 

Then,  for  any  toff  nff)  can  be  computed  in  0(log  |A|)  propositional  satisfiability 
tests. 


The  idea  is  to  perform  a  binary  search  on  A  to  find  the  lowest  Z(r)  such  that  there 
is  a  model  for  cj>  that  does  not  violate  any  rule  r1  with  priority  Z(r')  >  Z(r).  This 
is  done  by  dividing  A  into  two  roughly  equal  sections:  top-half  (rm,;c/  to  Thigh )  and 
bottom-half  (now  to  rm^).  A  satisfiability  test  on  the  wff  a  =  4>  A'jZmidLPj  A  Aj 
decides  whether  the  search  should  continue  (in  a  recursive  fashion)  on  the  bottom- 
half  or  top-half. 

Lemma  4.11  The  value  of  Z(<f)  — >  a)  in  Eq.  4-10  can  be  computed  in  0( log  |7LZ+|) 
satisfiability  tests. 

Let  A'  in  Step  3(a)  of  procedure  Z+_order  be  equal  to  { tpi  tpt}]  the  computation 
of  Eq.  4.10  is  equivalent  to  computing  the  k  of  the  wff  a  A  <j)  f\i  T>%  A  fit  where  i 
ranges  over  all  the  rules  in  A'  by  performing  the  binary  search  on  the  set  7 ZZ+ . 

Theorem  4.12  Given  a  consistent  A,  the  computation  of  the  ranking  Z+  re¬ 
quires  0(|A|2  x  log  | A | )  satisfiability  tests. 

Computing  Eq.  4.10  in  Step  3(b)  can  be  done  in  O(log  \'TZZ+\)  satisfiability  tests 
according  to  Lemma  4. 11, 12  and,  since  it  will  be  executed  at  most  0(|A|)  times, 
it  requires  a  total  of  0 ( | A j  x  log  |A|).  Loop  3  is  performed  at  most  | A j  —  |Ao| 

nNote  that  Eqs.  4.10  and  4.11  correspond  to  Eqs.  4.8  and  4.7  in  Def.  4.4. 

12Note  that  we  need  7 ZZ+  to  be  sorted,  nondecreasingly,  with  respect  to  the  priorities  Z. 
This  requires  that  the  initial  values  inserted  in  7 ZZ+  in  Step  2  of  procedure  Z+_order  be  sorted 
taking  0(|Ao|2)  data  comparisons  and  that  the  new  Z-value  in  Step  3(c)  be  inserted  in  the 
right  place  taking  0(\R.Z+\)  data  comparisons.  We  are  assuming  that  the  cost  of  each  of  these 
operations  is  much  less  than  that  of  a  satisfiability  test. 
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times,  hence  the  whole  computation  of  the  priorities  Z+  on  rules  requires  a  total 
of  0(|A|2  x  log  | A | )  satisfiability  tests.13 

Once  Z+  is  known,  determining  the  strength  8  with  which  an  arbitrary  query  a 
is  confirmed,  given  the  information  (j>,  requires  0(log  j  A | )  satisfiability  tests:  First 
K+((f)/\cr)  and  K+((j>/\-^a)  are  computed,  using  a  binary  search  as  in  Lemma 4.10. 
Then,  these  two  values  are  compared  and  the  difference  is  equated  with  the 
strength  <5.  Clearly,  if  the  rules  in  A  are  of  Horn  form,  computing  the  prior¬ 
ity  ranking  Z+  and  deciding  the  plausibility  of  queries  (</>  ^  a)  can  be  done  in 
polynomial  time  [25]. 

In  the  special  case  of  a  flat  A,  that  is,  all  <Ts  =  0,  the  procedure  reduces  to  the 
following  steps:  First,  identify  all  rules  r,-  :  ( pi  — *■  ipi  in  A  for  which  the  formula 

Vi  A  4>i  A  Vi  A  ipj  (4.13) 

GA 

is  satisfiable.  Next,  assign  to  these  defaults  priority  Z+  =  0,  remove  them  from  A, 
and  repeat  the  process,  assigning  to  the  next  set  of  defaults  the  priority  Z+  =  1, 
then  Z+  =  2,  and  so  on.  Once  Z+  is  known,  the  rank  k+  of  any  wff  is  given 
by  k+(4>)  =  minimum  i,  such  that 

<t>  A  Vi  A  ^i  (4-14) 

j:Z+(r3)>i 

is  satisfiable.  This  special  case  of  a  flat  A  constitutes  system-Z  as  introduced  by 

Pearl  [100]. 

An  important  result  implied  by  Eqs.  4.13  and  4.14  gives  a  method  of  con¬ 
structing  a  propositional  theory  Th{(j))  that  implies  all  the  conclusions  a  that 
plausibly  follow  from  a  given  evidence  <f>,  that  is,  <f>  (~  a.  Such  a  theory  is  given 
by  the  formula 

Th{<fl)  =  A  0,-.  (4.15) 

i:Z+  (rt;)>K+  ( (f> ) 

This  is  somewhat  reminiscence  of  Brewka’s  [16]  and  Poole’s  [104]  idea  of  con¬ 
structing  preferred  subtheories  that  are  maximally  consistent  with  the  context  <f>. 
Here  ,  the  construction  is  more  cautious;  it  stops  as  soon  as  all  rules  of  priority 
Z+  >  K+((j))  are  included  in  the  theory.  Ways  of  completing  the  construction 
were  proposed  by  Boutilier  [15]  (see  discussion  in  Sec.  4.7).  Note,  however,  that 
in  contrast  to  Brewka’s  and  Poole’s  proposals,  our  priorities  are  computed  auto¬ 
matically  from  the  knowledge  base. 

13The  complexity  of  the  rest  of  the  steps  in  the  procedure  is  bounded  by  0(|A|)  satisfiability 
tests. 
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4.4  Examples 


The  following  examples  illustrate  properties  of  the  /c+-ranking  and  the  use  of  6  to 
impose  priorities  among  defaults.  Example  4.1  shows  how  the  specificity-based 
preferences  in  Example  2.2  are  established  and  maintained  by  the  /c+-ranking, 
freeing  the  rule  encoder  from  such  considerations.14  In  Example  4.2,  the  strengths 
8  are  used  to  establish  preferences  when  specificity  relations  are  not  available. 


Example  4.1  (Irrelevance  and  specificity)  Consider  the  following  set  of  rules 
taken  from  Example  2.2: 


ri:  b^f 


0  2  ; 

r2:  p  — >  b 

S3 

r3:  p  -»  - 

r  Si 

r±.  j  ->  a 


f 


standing  for  rx:“birds  fly”,  r2:“penguins  are  birds”,  r3:“penguins  don’t  fly”,  and 
r4  Allying  things  are  airborne”.  The  Z+-ordering  is  computed  as  follows:  Since 
both  ri  and  r4  are  tolerated  by  all  the  rules  in  the  knowledge  base,  Z+(ri)  =  81 
and  Z+(r4)  =  84.  Any  /c+-minimal  world  verifying  r2  or  r3  must  violate  r4; 
therefore,  following  procedure  Z+_order,  Z+(r2)  =  <5i  +  ^2  +  1  and  Z+(r3)  = 
8\  +  83  +  1.  The  first  column  of  Table  4.2  contains  some  queries,  the  second 
contains  p-entailed  conclusions,  and  the  last  contains  conclusions  entailed  by 
system-Z+.  The  reason  system- Z+  concludes  that  “red  birds  fly”  (r  A  b  |~  /  is 


Queries 

p-entailment 

system- Z+ 

(p  A  b,  f)  -  “Do  penguin-birds  fly?” 

NO 

NO 

(r  A  b,  f)  -  “Do  red  birds  fly?” 

Possibly 

YES 

(b,a)  -  “Are  birds  airborne?” 

Possibly 

YES 

Table  4.2:  Plausible  conclusions  in  Example  3.1. 

as  follows:  Since  r  is  a  proposition  that  does  not  appear  in  the  knowledge  base, 
14 A  general  formalization  of  this  behavior  is  Theorem  4.14. 
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any  rule  violated  by  a  world  to  j=  b  A  /  is  also  violated  by  a  world  to'  (=  b  A  r  A  /. 
Thus,  conclusions  in  system-Z+  are  unperturbed  by  irrelevant  propositions.  In 
general,  we  have:15 

Proposition  4.13  Let  A  be  a  consistent  set  of  defaults,  and  let  p  be  a  proposition 
not  appearing  in  any  of  the  defaults  in  A;  then  p  A  f  cr  iff  a. 

Another  interesting  conclusion  sanctioned  by  |~  that  is  not  p-ent ailed  is  “birds 
are  normally  airborne”.  Note  that  this  inference  reflects  a  limited  form  of  rule 
chaining  (not  present  in  p-entailment). 

Finally,  as  in  p-entailment,  the  priorities  Z+  recognize  that  r 3  is  more  specific 
than  rx  and  sanctions  “a  penguin-bird  does  not  fly”.  Note  that  the  preference  for 
?’3  over  rx  is  established  independently  of  the  initial  S's  assigned  to  these  rules. 
In  the  knowledge  base  above,  the  priority  of  r3  (“typically  penguins  do  not  fly”) 
was  adjusted  from  63  to  6X  +  £3  +  1,  so  as  to  supersede  Si,  the  priority  of  the 
conflicting  rule  “typically  birds  fly”.  As  a  result  of  such  adjustments,  the  relative 
importance  of  rules  is  maintained  throughout  the  system,  and  compliance  with 
specificity-type  constraints  is  automatically  preserved.  This  is  made  precise  in 
the  following  theorem. 

Theorem  4.14  Let  rx  :  p  ^  f  and  r<2  :  <j)  cr  be  two  rules  in  a  consistent  A 
such  that 

1.  p  <f>  (i-e.,  p  is  more  specific  than  <f>). 

2.  There  is  no  model  satisfying  p  A  xf  A  <j>  A  a  (i.e.,  rx  conflicts  with  r 2). 

Then  Z+(ri)  >  Z+(r 2)  independent  of  the  values  of  S\  and  82- 

In  other  words,  the  Z+ -ordering  guarantees  that  features  of  more  specific  contexts 
override  conflicting  features  of  less  specific  contexts. 

Example  4.2  (Belief  strength)  Consider  the  following  knowledge  base  (a  sub¬ 
set  of  Example  2.3): 

61 

n:  p 

r?:  r  — »  ->p 

15Note  that  this  proposition  is  system-Z+  counterpart  to  maximum  entropy  Prp.  3.15. 
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standing  for  r\ : “typically  Quakers  are  pacifists”  and  r2: “typically  Republicans 
are  not  pacifists”.  Since  each  rule  is  tolerated  by  the  other,  the  Z+  of  each  rule  is 
equal  to  its  associated  8:  Z+(r\)  =  Si  and  Z+(r2)  =  82.  Given  an  individual,  say 
Nixon,  who  is  both  a  Republican  and  a  Quaker,  the  decision  of  whether  Nixon  is 
a  pacifist  will  depend  on  whether  61  is  larger  than,  less  than,  or  equal  to  S2-  This 
is  because  any  model  u>rqp  for  Quakers,  Republicans,  and  pacifists  must  violate  r2, 
and  consequently  /c+(cur?j,)  =  82,  while  any  model  cur9-,p  for  Quakers,  Republicans, 
and  non-pacifists  must  violate  rq,  that  is,  K+(urq^p)  =  8\.  In  this  case  the  decision 
to  prefer  one  world  over  the  other  does  not  depend  on  specificity  considerations 
but  rather  on  whether  the  rule  encoder  believes  that  religious  convictions  carry 
more  weight  than  political  affiliations. 

The  main  shortcomings  of  system-/?"1"  are  discussed  in  Sections  3.4  and  4.7. 

4.5  Belief  Change,  Soft  Evidence,  and  Imprecise  Obser¬ 
vations 

So  far,  a  query  <f>\%  a  was  defined  as  a  pair  of  Boolean  formulas  (<f>,  a ),  where 
<f>  (the  context)  stands  for  the  set  of  observations  at  hand  and  a  (the  target) 
stands  for  the  conclusion  whose  belief  we  wish  to  confirm,  deny,  or  assess.  A 
query  (<f>,cr)  would  be  answered  in  the  affirmative  if  a  was  found  to  hold  in  all 
minimally  ranked  models  of  </>,  and  the  degree  of  belief  in  a  would  be  given  by 
K (-i(J  A  <f>)  —  k(ct  A  4). 

In  many  cases,  however,  the  queries  we  wish  to  answer  cannot  be  cast  in  this 
format,  because  our  set  of  observations  is  not  precise  enough  to  be  articulated  as  a 
crisp  Boolean  formula.  For  example,  assume  that  we  are  throwing  a  formal  party 
and  our  friends  Mary  and  Bill  are  invited.  However,  judging  from  their  previous 
behavior,  we  believe  “if  Mary  goes  to  the  party,  Bill  will  stay  home  (with  strength 
8y\  written  M  ~'B.  Now  assume  that  we  have  a  strong  hunch  (with  degree 
K)  that  Mary  will  go  to  the  party  (perhaps  because  she  is  extremely  well  dressed 
and  is  not  consulting  the  movie  section  in  the  Times )  and  we  wish  to  inquire 
whether  Bill  will  stay  home.  It  would  be  inappropriate  to  query  the  system 
with  the  pair  ( M,~<B ),  because  the  context  M  has  not  been  established  beyond 
doubt.  The  difference  could  be  critical  if  we  have  arguments  against  “Bill  staying 
home”,  (e.g.,  he  was  seen  renting  a  tuxedo).  A  flexible  system  should  allow  the 
user  to  assign  a  degree  of  belief  to  each  observational  proposition  in  the  context 
<f>  and  proceed  with  analyzing  their  rational  consequences.  Thus,  a  query  should 
consist  of  a  tuple  like  (4i,  /Q;  <j>2,  /T2; . . . ,  <t>m,  Km  :  <r),  where  each  K,  measures 
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the  degree  to  which  the  contextual  proposition  fa  is  supported  by  evidence.16 

At  first  glance  it  might  seem  that  system-Z+  would  automatically  provide 
such  a  facility  through  the  use  of  variable-strength  rules.  For  example,  to  express 
the  fact  that  Mary  is  believed  to  be  going  to  the  party,  we  can  perhaps  use  a 
dummy  rule  Obs\  —>■  M  (stating  that  if  Mary  meets  the  set  of  observations  Obsi, 
then  Mary  is  believed  to  be  going  to  the  party)  and  then  add  the  proposition 
Obs\  to  the  context  part  of  the  query,  to  indicate  that  Obsi  has  taken  place. 

This  proposal  has  several  shortcomings,  however.  First,  in  many  systems  it 
is  convenient  to  treat  if-then  rules  as  a  stable  part  of  our  knowledge,  unper¬ 
turbed  by  observations  made  about  a  particular  individual  or  in  any  specific  set 
of  circumstances.  This  permits  us  to  compile  rules  into  a  structure  that  allows 
efficient  processing  over  a  long  stream  of  queries.  Adding  query-induced  rules  to 
the  knowledge  base  will  neutralize  this  facility. 

Second,  rules  and  observations  combine  differently:  The  latter  should  accu¬ 
mulate,  the  former  do  not.  For  example,  if  we  have  two  rules  a  -b  c  and  b  c  and 
we  observe  a  and  b,  system-Z+  would  believe  c  to  a  degree  max(hi,  S2).  However, 
if  a  and  b  provide  two  independent  reasons  for  believing  c,  the  two  observations 
together  should  endow  c  with  a  belief  that  is  stronger  than  any  one  component 
in  isolation.  To  incorporate  such  cumulative  pooling  of  evidence,  we  must  encode 
the  assumption  that  a  and  b  are  conditionally  independent  (given  c),  which  is  not 
automatically  embodied  in  system-Z+.17 

To  avoid  these  complications,  the  method  we  propose  treats  imprecise  obser¬ 
vations  by  invoking  specialized  conditioning  operators,  unconstrained  by  a  rule’s 
semantics.  We  distinguish  between  two  types  of  evidential  reports: 

1.  Type-J:  “All  things  considered,”  our  current  belief  in  f  should  become  J. 

2.  Type-L:  “Nothing  else  considered,”  our  current  belief  in  f>  should  shift  by 
L. 

16We  remark  that  evidence  in  this  dissertation  is  regarded  as  setting  the  context  of  a  query 
and  not  as  a  modifier  of  the  knowledge  in  A.  Statistical  methods  for  accomplishing  the  latter 
task  are  explored  by  Bacchus  [6]. 

1 ' The  assumptions  of  conditional  independence  among  converging  rules  is  embodied  in  the 
formalism  of  maximum  entropy  (see  Chapter  3  and  [47]),  as  well  as  in  the  causal  interpretation 
of  Chapter  5. 
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4.5.1  Type-J:  All  Things  Considered 

Let  (f>  be  the  wff  representing  the  event  whose  belief  we  wish  to  update  so  that 
k'(-> <fi)  —  J  (and,  consequently,  K,'((f>)  =  0).18  In  order  to  compute  /c'(-0)  for  every 
wff  ?/>,  we  rely  upon  Jeffrey’s  rule  of  conditioning  [99].  Jeffrey’s  rule  is  based  on  the 
assumption  that  when  an  agent  reports  that  an  observation  changed  her  degree 
of  belief  in  4 i>,  such  observation  does  not  normally  change  the  conditional  degree  of 
belief  in  any  propositions  conditional  on  the  evidence  <j>  or  on  the  evidence  -><f>  [99]. 
Thus,  letting  P  and  P'  denote  the  agent’s  probability  distribution  before  and  after 


the  observation  respectively,  we  have19 

Pf'if |^)  =  P(if \<f>)  and  P'(if\-^(f))  =  P(if |-^),  (4.16) 

which  leads  to  Jeffrey’s  rule, 

P’W  =  P(W>)P\<t>)  +  P(i>\^)P'h4>).  (4.17) 

Translated  into  the  language  of  rankings  (using  Eqs.  4. 1-4.3),  Ecj.  4.17  yields 
K'(ip)  =  min +  /e'( -><£)],  (4.18) 

which  ofFers  a  convenient  way  of  computing  n'ff)  once  we  specify  tiff)  =  0  and 


/v/(-'<^>)  =  J.  Eq.  4.18  assumes  an  especially  attractive  form  when  computing  the 
k'  of  a  world  u: 


I  K(u)\(fi)  +  n'{6)  if  to  6 

kV)=  {  (4-19) 

k(u.’|-i <f)  +  if  LO  |=  -if 

Eq.  4.19  corresponds  exactly  to  the  ev-conditionalization  proposed  in  Spohn  [120] 
(Def.  6,  p.  117),  with  a  =  J.  If  =  00,  this  process  is  equivalent  to  ordinary 

Bayesian  conditionalization,  since  k'{u)  —  k{io\f)  if  to  \=  (f>  and  k'{u)  =  00  other¬ 
wise.  Note,  however,  that  in  general  this  conditionalization  is  not  commutative; 
if  4>\  and  f>2  are  mutually  dependent  (i.e.,  /c(^2|<?h)  7^  ft(</>2)),20  the  order  in  which 
we  establish  /e(-><^i)  =  Jx  and  k(-><^2)  =  J2  might  make  a  difference  in  our  final 

18This  is  an  immediate  consequence  of  the  semantics  for  rankings  and  corresponds  to  the 
normalization  in  probability  theory  (see  Eq.  4.2). 

19Eq.  4.16  is  known  as  the  J-condition  [99]. 

20This  condition  mirrors  probabilistic  dependence,  namely,  P{(j>2 |di)  f  P(^> 2)- 
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belief  state.21  This  is  not  surprising  since  in  the  “all  things  considered  interpre¬ 
tation”  the  last  report  is  presumed  to  summarize  all  previous  observations. 


4.5.2  Type-L  Reports:  Nothing  Else  Considered 


L-conditionalization  is  appropriate  for  evidential  reports  of  the  type  “a  new  evi¬ 
dence  was  obtained  which,  by  its  own  merit,  would  support  <j>  to  degree  L.”  Unlike 
J-conditionalization,  the  degree  L  now  specifies  changes  in  the  belief  of  <^,  not 
the  absolute  value  of  the  final  belief  in  <f>.  As  in  the  case  of  type-J  reports,  we 
assume  that  in  naming  <f>  as  the  direct  beneficiary  of  the  evidence,  the  intent  is 
to  convey  the  assumption  of  conditional  independence,  as  formulated  in  Eq.  4.17. 
Next,  following  the  method  of  virtual  conditionalization  [97],  we  assume  that  the 
degree  of  support  L  characterizes  the  likelihood-ratio  A (</))  associated  with  some 
undisclosed  observation  Ohs : 


P(Obs\<f>) 

p{Obs\-^<t>y 

which  governs  the  updates  via  the  product  rule 


p'O) 

p'i-'t)  p w) 


Translated  into  the  language  of  rankings,  this  assumption  yields 


(4.20) 


(4.21) 


—  n(-^(f))  —  L  (4.22) 

and,  since  either  or  ac/(— must  be  zero,  we  obtain 

n'{<j>)  =  ma,x[0;  K.((f)  —  K(-^(j))  —  L\,  (4.23) 

k'(-i^)  =  max[0 ;  K(-f<j>)  —  K(<j>)  +  L].  (4.24) 


We  see  that  the  effect  of  L-conditionalization.  is  to  shift  the  difference  between 
the  degrees  of  disbelief  in  <j>  and  -><^>  by  the  specified  amount  L.  Once  «'(^)  is 
known,  Jeffrey’s  rule  (Eq.  4.18)  can  be  used  to  compute  the  /c'(<t)  for  an  arbitrary 

21Spohn  ([120],  p.  118)  has  acknowledged  the  desirability  of  commutativity  in  evidence  pool¬ 
ing  but  has  not  stressed  that  a-conditionalization  commutes  only  in  a  very  narrow  set  of 
circumstances  (partially  specified  by  his  Thm.  4).  These  circumstances  require  that  successive 
pieces  of  evidence  support  only  propositions  that  are  relatively  independent  —  the  truth  of  one 
proposition  should  not  imply  a  belief  in  another.  Shenoy  [114]  has  corrected  this  deficiency  by 
devising  a  commutative  combination  rule  that  behaves  like  to  L-conditioning. 
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wff  a  yielding 


min[«(^|<^)  +  n((f>)  —  ~  L\  /c(^>|-></>)] 

K'(a)  =  <  mm{n(ip\(j))]  +  /c(-i^)  +  L  —  (4.25) 

min[fi:(V,|^);  /c(V,|-1^)] 

depending  on  whether  /c(— ><^>)  +  /c(<^»)  is  less  than,  greater  than,  or  equal  to  L.  This 
expression  takes  the  following  form  for  k'(u): 

{K,(<x>\<f))  +  max[0;  n((f))  —  k(-i</>)  —  L]  if  cu  |=  </>, 

(4.26) 

K(uj\-<(j))  +  maxjO;  k(-><^)  —  «(</>)  +  L ]  if  c o  |=  ->  <j). 

As  in  J-conditionalization,  if  L  =  oo  then  k'{u)  =  k(lo\<J)).  For  the  general  case, 
we  can  see  that  the  effect  of  L-conditionalization  is  to  shift  downward  the  k  of 
all  worlds  that  are  models  of  the  supported  proposition  <j>  relative  to  the  k  of  all 
worlds  that  are  not  models  for  <f>.  However,  unlike  J-conditionalization,  the  net 
relative  shift  is  constant  and  is  equal  to  T,  independent  of  the  initial  value  of  K((f>). 
It  is  easy  to  verify  that  L-conditionalization  is  commutative  (as  is  its  probabilistic 
counterpart,  see  Eq.  4.21),  and  hence  it  permits  a.  recursive  implementation  in 
the  case  of  multiple  evidence. 

We  can  illustrate  these  updating  schemes  through  the  party  example  consist¬ 
ing  of  the  single  rule  rm  :  M  —>  -> B  (“if  Mary  goes  to  the  party,  then  Bill  will  not 
go”).  A  trivial  application  of  procedure  Z+_order  yields  Z+(rm)  =  4,  and  using 
Eqs.  4.4  and  4.7  we  find  k(x)  =  0  for  every  proposition  ag  except  x  =  B  A  M, 
for  which  k+(M  A  B)  =  5.  This  means  that  we  have  no  reason  to  believe  that 
either  Mary  or  Bill  will  go  to  the  party,  but  we  are  pretty  sure  that  both  of 
them  will  not  show  up.  Now  suppose  we  see  that  Mary  is  very  well  dressed,  and 
this  observation  makes  our  belief  in  M  increase  to  3,  that  is,  k.+,(->A7)  =  3.  As 
a  consequence,  our  belief  in  Bill  staying  home  also  increases  to  3  since,  using 
either  J-conditionalization  or  L-conditionalization,  k+' (B)  =  3.  Next,  suppose 
that  someone  tells  us  that  he  has  a  strong  hunch  that  Bill  plans  to  show  up  for 
the  party,  but  fails  to  tell  us  why.  There  are  two  ways  in  which  this  report  can 
influence  our  beliefs.  The  natural  way  would  be  to  assume  that  our  informer  has 
not  seen  Mary’s  dress  and  even  might  not  be  aware  of  Bill  and  Mary’s  relationship 
—  hence  we  assess  the  impact  of  his  report  in  isolation  and  say  that  whatever 
the  value  of  our  current  belief  in  Bill  going,  it  should  increase  by  3  increments, 
or  L  =  3.  Following  Eq.  4.25,  K+" (B)  and  k+"(-^M)  will  both  be  equal  to  0, 
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and  we  are  back  to  the  initial  uncertainty  about  Bill  or  Mary  going  to  the  party, 
except  that  our  disbelief  in  both  Mary  and  Bill  being  at  the  party  has  decreased 
to  k+"  ( M  A  B)  =  2.  A  second  way  would  be  to  assume  that  our  informer  is 
omniscient  and  already  has  taken  into  consideration  all  we  know  about  Bill  and 
Mary.  He  means  for  us  to  revise  our  rankings  so  that  the  final  belief  in  “Bill 
going”  will  be  fixed  at  k+"(-<B)  =  3.  With  this  interpretation,  we  J-condition 
k+'  on  the  proposition  <j>  =  ~*B  and  obtain  fa"  (M)  =  3,  concluding  that  it  is 
Mary  who  will  not  show  up  to  the  party  after  all. 

4.5.3  Complexity  Analysis 

From  Eq.  4.18  we  see  that  n'ifa)  can  be  computed  from  /c(i/>| fa  and  nifafafa, 
which,  assuming  we  have  Z+ ,  requires  a  logarithmic  number  of  propositional 
satisfiability  tests  (see  Sec.  4.3).  L-conditionalization  can  follow  a  similar  route 
(see  Eq.  4.25). 

Special  precautions  must  be  taken  when  simultaneous,  multiple  pieces  of  ev¬ 
idence  become  available.  First,  J-conditionalization  is  not  commutative,  hence 
we  cannot  simply  compute  k'  by  J-conditioning  on  <f>\  and  then  J-conditioning  k' 
on  4>2  to  get  fa.  We  must  J-condition  simultaneously  on  fa  and  with  their  re¬ 
spective  J-levels,  say  J\  and  J 2.  Worse  yet,  an  auxiliary  effort  must  be  expended 
to  compute  the  J-level  of  each  combination  of  fas,  in  our  case  fa  A  fa,  fa  A  ->fa, 
etc.  This  is  no  doubt  a  hopeless  computation  when  the  number  of  observations 
is  large. 

L-conditionalization,  by  virtue  of  its  commutativity,  enjoys  the  benefits  of 
recursive  computations.  Let  ei  and  e2  be  two  (undisclosed)  pieces  of  evidence 
supporting  fa  (with  strength  L\)  and  fa  (with  strength  L2),  respectively.  We 
first  L-condition  k  on  fa  and  calculate  fa(fa)  and  a' {fa)  using  Eq.  4.24  and 
Eq.  4.25,  respectively.  Applying  Eq.  4.25  this  time  to  K.'{fa  A  fa),  we  calculate 
K'(ip\(p2).  Second,  we  L-condition  k'  on  fa,  compute  Kn{fa )  using  Eq.  4.24,  and 
finally,  using  n'{fafa)  and  tt"{fa)  in  Eq.  4.25  obtain  fa(fa.22  Note  that,  although 
each  of  these  calculations  requires  only  (9 (log  |A|)  satisfiability  tests,  this  com¬ 
putation  is  effective  only  when  we  have  a  well  designated  target  hypothesis  a 
to  estimate.  The  computation  must  be  repeated  each  time  we  change  the  tar¬ 
get  hypothesis,  even  when  the  context  remains  unaltered.  This  is  because  we 
no  longer  have  a  facility  for  economically  encoding  a  complete  description  of  k' , 
as  we  had  for  k  (using  the  Z+-function).  Thus,  the  encoding  for  k'  may  not 
be  as  economical  as  that  for  k  (the  number  of  worlds  is  astronomical),  unless 

22The  generalization  to  more  than  two  pieces  of  evidence  is  straightforward. 
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we  manage  to  find  dummy  rules  that  emulate  the  constraints  imposed  on  <j> i  by 
the  (undisclosed)  observation.  Such  dummy  rules  must  enforce  the  conditional 
independence  constraints  embedded  in  Eq.  4.17,  without  violating  the  admissi¬ 
bility  constraints  (Eq.  4.6)  in  A.  These  dummy  rules  can  be  encoded  using  the 
stratification  mechanism  proposed  in  Chapter  5  (see  also  [54]). 

4.6  Relation  to  the  AGM  Theory  of  Belief  Revision 

Alchourron,  Gardenfors,  and  Makinson  (AGM)  have  advanced  a  set  of  postu¬ 
lates  that  have  become  a  standard  against  which  proposals  for  belief  revision  are 
tested  [3],  The  AGM  postulates  model  epistemic  states  as  deductively  closed  sets 
of  (believed)  sentences  and  characterize  how  a  rational  agent  should  change  its 
epistemic  states  when  new  beliefs  are  added,  subtracted,  or  changed.  The  central 
result  is  that  the  postulates  are  equivalent  to  the  existence  of  a  complete  preorder¬ 
ing  of  all  propositions  according  to  their  degree  of  epistemic  entrenchment  such 
that  belief  revisions  always  retain  more  entrenched  propositions  in  preference  to 
less  entrenched  ones.  Although  the  AGM  postulates  do  not  provide  a  calculus 
with  which  one  can  realize  the  revision  process  or  even  specify  the  content  of  an 
epistemic  state  [14,  27,  92],  they  nevertheless  imply  that  a  rational  revision  must 
behave  as  though  propositions  were  ordered  on  some  scale. 

Spohn  [120]  has  shown  how  belief  revision  conforming  to  the  AGM  postulates 
can  be  embodied  in  the  context  of  ranking  functions.  Once  we  specify  a  single 
ranking  function  a  on  possible  worlds,  we  can  associate  the  set  of  beliefs  with 
those  propositions  / 3  for  which  k(-<(3)  >  0.  It  follows,  then,  that  the  models  for 
the  theory  ip  representing  our  beliefs  (written  Mods{ij)))  consist  of  those  worlds 
io  for  which  k(u>)  =  0.  To  incorporate  a  new  belief  (/),  one  can  raise  the  n  of  all 
models  of  -> (f>  relative  to  those  of  <f>,  until  A?(— ><^)  becomes  (at  least)  1,  at  which 
point  the  newly  shifted  ranking  defines  a  new  set  of  beliefs.  This  process  of  belief 
revision,  which  Spohn  named  a-conditioning  (with  a  =  1  for  this  particular  case), 
was  shown  to  comply  with  the  AGM  postulates  [33].  It  follows  then  that  the 
process  of  revising  beliefs  in  all  three  forms  of  conditioning  also  obeys  the  AGM 
postulates:  Ordinary  conditioning  amounts  to  setting  a  =  oo,  J-conclitioning 
amounts  to  a  =  J,  while  L-conditioning  calls  for  shifting  the  models  of  4>  relative 
to  those  of  -><f>  by  L  units  of  surprise.  If  we  denote  by  the  revised  ranking 

after  conditioning  (with  a  =  oo),  then  the  dynamics  of  belief  is  governed  by  the 
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following  equation: 


k^lo) 


k(u>)  —  n((f))  if  lo  |=  <f>, 
oo  otherwise. 


(4.27) 


Accordingly,  testing  whether  a  given  sentence  a  is  believed  after  revision  amounts 
to  testing  whether  >  0  or,  equivalently,  whether  /c(— >  0. 

The  unique  feature  of  the  system  described  in  this  chapter  is  that  the  above 
test  can  be  performed  by  purely  syntactic  means,  involving  only  the  rules  in 
A.  These  computations  are  demonstrated  in  the  following  example,  where  the 
rankings  in  Tables.  4. 3-4. 5  are  shown  for  illustrative  purposes  only. 


Example  4.3  (Working  students)  The  set  A  =  {s  — >  ->w,s  — >  a, a  — >  w;} 
stands  for  “typically  students  don’t  work”,  “typically  students  are  adults”,  and 
“typically  adults  work”,  respectively.23  The  Z+ -ordering  on  the  rules  (computed 
according  to  Eq.  4.13)  are:  Z+(a  — >  w)  =  0  and  Z+(s  — >  ->w)  =  Z+(s  — >  a)  =  1, 
from  which  the  initial  k+  ranking  can  be  computed  (Eq.  4.7),  as  depicted  in 
Table  4.3.  The  rankings  in  Tables  4.4  ancl  4.5  show  the  revised  rankings  after 


K+ 

Possible  worlds 

0 

(-is,  a,  w ),  (-is,  -i a,  tv, ),  (->s,  -■«,  ->w, ) 

1 

Kop®),  (s,  a, ->«?,) 

2 

(s,  a,  w),  (s,  -ia,  -i w),  ( s ,  ->a,  w) 

Table  4.3:  Initial  ranking  for  the  student  triangle  in  Example  4.3. 
observing  an  adult  (/ca)  and  a  student  (/ts),  respectively. 

The  beliefs  associated  with  these  rankings  can  be  computed  from  the  worlds 
residing  in  k+  =  0.  Thus,  in  “an  adult  works”,  whereas  in  /c+  “a  student  is  an 
adult  that  does  not  work”.  These  beliefs  can  be  computed  more  conveniently  by 
syntactic  analysis  of  the  rules  and  their  Z+-ordering,  either  by  using  Eq.  4.14,  or 
by  extracting  from  A  a  propositional  theory  that  is  maximally  consistent  with  the 
observation  using  Eq.  4.15.  For  example,  the  beliefs  associated  with  observing 

23Note  that  all  <5;’s  are  0  for  this  example. 
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< 

Possible  worlds 

0 

(-i«,o,w) 

1 

2 

(s,a,w) 

Table  4.4:  Revised  ranking  after  observing  an  adult. 


Possible  worlds 

0 

(s,a,->w,) 

1 

(s,  a,  w),  (s,  -i a,  ~>w),  (s,  ->a,  w) 

Table  4.5:  Revised  ranking  after  observing  a  student. 

a  student  s  are  given  by  the  theory  {s,.s  D  a,  s  D  -'ll)}.  These  two  implications 
mirror  the  rules  s  — >  ->w  and  s  — »  a  which  are  the  unique  set  of  rules  that  are 
maximally  consistent  with  s. 

There  are  several  computational  and  epistemological  advantages  to  basing  the 
revision  process  on  a  finite  set  of  conditional  rules,  rather  than  on  the  beliefs  or 
on  the  rankings  or  the  expectations  that  emanate  from  those  rules.  The  number 
of  propositions  in  one’s  belief  set  is  astronomical,  as  is  the  number  of  worlds, 
while  the  number  of  rules  is  usually  manageable. 

This  computational  necessity  has  been  recognized  by  several  researchers.  For 
example,  Nebel  [92]  adapted  the  AGM  theory  so  that  finite  sets  of  base  proposi¬ 
tions  mediate  revisions.  The  basic  idea  in  this  syntax-based  system  is  to  define  a 
(total)  priority  order  on  the  set  of  base  propositions  and  to  select  revisions  to  be 
maximally  consistent  relative  to  that  order,  as  exemplified  in  the  nonmonotonic 
systems  of  Brewka  [16]  and  Poole  [105]  and  in  Example  4.3.  Nebel  has  shown 
that  such  a  strategy  can  satisfy  almost  all  the  AGM  postulates.  Boutilier  [14] 
has  further  shown  that,  indeed,  the  priority  function  Z+  corresponds  naturally 
to  the  epistemic  entrenchment  ordering  of  the  AGM  theory.24 

24The  proof  in  [14]  considers  the  priorities  Z+  resulting  from  a  flat  set  of  rules  as  in  system- 
Z  [100].  Boutilier  [15]  also  shows  that  an  entrenchment  ordering  obeying  the  AGM  framework 
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Unfortunately,  even  Nebel’s  theory  does  not  completely  succeed  in  formaliz¬ 
ing  the  practice  of  belief  revision,  as  it  does  not  specify  how  the  priority  order 
on  the  base  propositions  is  to  be  determined.  Although  one  can  imagine,  in 
principle,  that  the  knowledge  encoder  specify  this  priority  order  in  advance,  such 
specification  would  be  impractical,  since  the  order  might  (and,  as  we  have  seen, 
should)  change  whenever  new  rules  are  added  to  the  knowledge  base.  By  con¬ 
trast,  system- Z+  extracts  both  beliefs  and  rankings  of  beliefs  automatically  from 
the  content  of  A;  no  outside  specification  of  belief  orderings  is  required. 

Finally,  and  perhaps  most  significantly,  system-Z+  is  capable  of  responding 
not  merely  to  empirical  observations  but  also  to  linguistically  transmitted  infor¬ 
mation  such  as  conditional  sentences  (i.e.,  if-then  rules).  For  example,  suppose 
someone  tells  us  that  “typically,  if  a  person  works,  that  person  is  compensated” 
(w  — >  c);  we  add  this  new  rule  to  our  knowledge  base  (verifying  first  that  the 
addition  is  admissible),  recompute  Z+,  and  are  prepared  to  respond  to  new  ob¬ 
servations  or  hearsay.  In  Spohn’s  system,  where  revisions  begin  with  a  given 
ranking  function  /c,  one  cannot  properly  revise  beliefs  in  response  to  new  condi¬ 
tional  sentences,  because,  to  maintain  consistency  and  coherence,  such  revision 
must  depend  not  only  on  the  initial  ranking  but  also  on  the  conditional  rules 
that  brought  about  that  initial  ranking.  Two  knowledge  bases  Ai  and  A2  might 
give  rise  to  the  same  ranking  function  k+  and,  yet,  the  new  conditional  can  be 
consistent  with  Ai  and  inconsistent  with  A2.  As  an  example,  consider  the  sets 
Ai  =  { a  — >  b}  and  A2  =  {a  — *  6,  -i&  — *■  ->a}.  The  ranking  /c+  for  these  knowledge 
bases  is  the  same  (see  Table  4.6).  The  knowledge  base  A'x  =  Ai  U  {->b  a}  is 
consistent,  as  shown  on  the  right-hand  side  of  Table  4.6.  On  the  other  hand,  the 
knowledge  base  A'2  =  A2U{-->&  —»  a}  is  inconsistent.  Clearly,  these  two  situations 
require  different  procedures  for  absorbing  the  new  conditional. 


K+ 

Ai,  A2 

A)  =  Ai  U  {-16  — ■»  a} 

0 

(a,  6),  (-ia,  b),  (-.a, ->6) 

(a,  6),  (-.a,6) 

1 

(a, ->6,) 

(a,  -16, ) 

2 

Empty 

(-1  a, -16,) 

Table  4.6:  Ranking  k+  for  Ai  =  {a  — »  b},  A2  =  {a  —>  b,  ->b  — >  ->a},  and  A) . 

The  AGM  postulates,  likewise,  are  inadequate  for  characterizing  the  process  of 
obtains  from  the  ^-priorities  of  the  negation  of  the  material  counterparts  of  rules. 
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incorporating  new  conditionals,  because  they  are  formulated  as  transformations 
on  belief  sets  and  are  thus  oblivious  to  the  set  of  conditionals  that  shaped  those 
belief  sets,  and  into  which  the  new  conditional  is  about  to  join.25 

The  ability  to  adopt  new  conditionals  (as  rules)  also  provides  a  simple  seman¬ 
tics  for  interpreting  nested  conditionals  (e.g.,  “if  you  wear  a  helmet  whenever  you 
ride  a  motorcycle,  then  you  won’t  get  hurt  badly  if  you  fall”26).  Nested  condi¬ 
tionals  cease  to  be  a  mystery  once  we  permit  explicit  references  to  default  rules. 
The  sentence  “If  ( a  — >  b )  then  (c  — >  d)"  is  interpreted  as 

“If  I  add  the  default  a  — +  b  to  A,  then  the  conditional  c  — *  d  will  be 
satisfied  by  the  consequence  relation  of  the  resulting  knowledge 
base  A'  =  A  U  {a  — >  6}”. 

which  is  clearly  a  proposition  that  can  be  tested  in  the  language  of  default-based 
ranking  systems.  Note  the  essential  distinction  between  having  a  conditional 
sentence  a  — *  b  explicitly  in  A  versus  having  a  conditional  sentence  a  — >  b  satisfied 
by  the  consequence  relation  |~  of  A.  In  both  cases  the  conditional  a  — *  b  would 
meet  the  Ramsey  test,  but  only  the  former  case  would  resist  the  adoption  of  the 
conditional  a  — ►  ~>b.  This  distinction  gets  lost  in  systems  that  do  not  acknowledge 
defaults  as  the  basis  for  ranking  and  beliefs.27 

4.7  Discussion 

This  chapter  proposes  a  belief-revision  system  that  reasons  semi-tractably  and 
plausibly  with  linguistic  quantification  of  both  observational  reports  (e.g.,  “looks 
like”)  and  domain  rules  (e.g.,  “typically”).  The  system  is  semi-tractable  in  the 
sense  that  it  is  tractable  for  every  sublanguage  in  which  propositional  satisfiability 
is  polynomial  (Horn  expressions,  network  theories,  acyclic  expressions,  etc.).  To 
the  best  of  my  knowledge,  this  is  the  first  system  that  reasons  with  approximate 
probabilities  which  offers  such  broad  guarantees  of  tractability.  Whereas  most 
tractability  results  exploit  the  topological  structure  of  the  knowledge  base  [20, 
71,  97]  (hypertrees,  or  partial  hypertrees),  ours  are  topology-independent.  These 
results  should  carry  over  to  the  theory  of  possibility  as  formulated  by  Dubois 
and  Prade  [28],  which  has  similar  features  to  Spohn’s  system  except  that  beliefs 

25Gardenfors  [33,  pp.  156-160]  attempts  to  devise  postulates  for  conditional  sentences,  but 
finds  them  incompatible  with  the  Ramsey  test. 

26 Judea  Pearl  attributes  this  example  to  Philip  Calabrese  (personal  communication). 

2 ‘Belief  revision  systems  proposed  in  the  database  literature  [31,  19]  suffer  from  the  same 
shortcoming.  In  that  context,  defaults  represent  integrity  constraints  with  exceptions. 
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are  measured  on  the  real  interval  [0,1].  In  addition,  as  Section  4.5  shows,  the 
system  can  also  accommodate  expressions  of  imprecise  observations  without  loss 
of  tractability,  thus  providing  a  good  model  for  weighing  the  impact  of  evidence 
and  counter-evidence  on  our  beliefs.  Also  the  enterprise  of  belief  revision,  as 
formulated  in  the  work  presented  in  [3,  33],  can  find  a  tractable  and  natural  em¬ 
bodiment  in  system-Z+,  unhindered  by  difficulties  that  plagued  earlier  systems. 

From  the  perspective  of  defeasible  reasoning,  system- Z+  provides  the  user 
with  the  power  to  explicitly  set  priorities  among  default  rules,  and  simultane¬ 
ously  maintains  a  proper  account  for  specificity  relations.  However,  it  inherits 
some  of  the  deficiencies  of  system- Z  [100]28  the  main  one  being  the  inability  to 
sanction  inheritance  across  exceptional  subclasses  (see  Exm.  3.2).  To  illustrate 
this  problem  consider  adding  a  a  fifth  rule  b  -4  l  (“birds  have  legs”)  to  the  set  of 
rules  in  Example  4.1: 

r\ :  b^>  f 

r2:  p  b 

$3  r 

r3-  P  “V 

r4:  f  h  a 

r5:  b  ^  l 

We  would  normally  conclude  from  this  set  that  “penguins  have  legs”,  while 
system- Z  (with  Si  =  0)  will  consider  “penguins”  exceptional  “birds”  with  re¬ 
spect  to  all  properties,  including  “having  legs”.  The  /c+-ranking  now  allows  the 
rule  author  to  partially  bypass  this  obstacle  by  adjusting  the  S' s.  If  S5  is  set  to 
be  bigger  than  S\  (to  express  perhaps  the  intuition  that  anatomic  properties  are 
more  typical  than  developmental  facilities)  then  the  system  will  conclude  that 
“typically  penguins  have  legs”.29  This  solution  however,  is  not  entirely  satisfac¬ 
tory.  If  we  add  to  this  new  set  of  rules  a  class  of  “birds”  which  are  “legless”, 
system- Z+  will  conclude  that  either  “penguins  have  legs”  or  “legless  birds  fly” 
but  not  both.30  In  order  to  overcome  this  difficulty,  a  system  must  comply  with 
the  preference  condition  in  Proposition  3.16.  As  shown  in  Section  3.4,  maximum 

28 And  the  rational  closure  described  in  [72]. 

29Note  that  the  fact  that  “penguins”  are  only  exceptional  with  respect  “flying”  (and  not 
necessarily  with  respect  to  “having  legs”)  is  automatically  encoded  in  the  Z+  ranking  by  forcing 
Z+{r3)  to  exceed  Z+(r i)  +  S3  independently  of  65  (and  Z+(rs)). 

30This  counterexample  is  due  to  Kurt  Ivonolige. 
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entropy  is  one  such  system;  two  other  systems  that  satisfy  this  result  are  Geffner’s 
Conditional  Entailment  [36,  38],  and  the  proposal  by  Boutilier  [15]. 

In  Geffner’s  conditional  entailment,  rather  than  letting  rule  priorities  dictate  a 
ranking  function  on  models,  a  partial  order  on  interpretations  is  induced  instead. 
To  determine  the  preference  between  u  and  a/,  we  examine  the  highest  priority 
rules  that  distinguish  between  the  two,  i.e.,  that  are  falsified  by  one  and  not  by 
the  other.  If  all  such  rules  remain  unfalsified  in  one  of  the  two  possible  worlds, 
then  this  model  is  the  preferred  one.  Formally,  if  ^[w]  and  T[u>')  stand  for  the  set 
of  rules  falsified  by  lo  and  u'  respectively,  then  u>  is  preferred  to  u/  iff  T[u>\  ^ 
and  for  every  rule  in  T[uS\  —  T[u>')  there  exists  a  rule  r'  in  J-[u'\  —  J-[u>\  such  that 
r'  has  a  higher  priority  than  r  (written  r'  >  r).  Thus,  a  model  lu  will  always 
be  preferred  to  u'  if  it  falsifies  a  proper  subset  of  the  rules  falsified  by  u/  (see 
Prop.  3.16). 

Priorities  among  rules  in  Geffner  proposal  differ  also  from  both  the  proposals 
in  Chapter  3  and  this  chapter,  in  that  the  rule  priority  relation  is  a  partial  order 
as  well.  This  partial  order  is  determined  by  the  following  interpretation  of  the 
rule  <p  — >  0:  If  ip  is  all  we  know,  then,  regardless  of  other  rules  that  A  may 
contain,  we  are  authorized  to  assert  0.  This  means  that  r  :  ip  — »  0  should  get  a 
higher  priority  than  any  argument  (a  chain  of  rules)  leading  from  ip  to  -i0  and, 
more  generally,  if  a  set  A'C  A  does  not  tolerate  r,  then  at  least  one  rule  in  A' 
ought  to  have  a  lower  priority  than  r.  In  the  example  above,31  r3  :  p  —»  -if  is  not 
tolerated  by  the  set  {r2  :  p  — ►  b,  rq  :  b  — >  /},  hence  we  must  have  that  r2  <  r3  or 
r2  <  7*1.  Similarly,  the  rule  :  p  — >  b  is  not  tolerated  by  {r3  :  p  — ►  ->/,  r*i  :  b  — +  /} 
and  hence  we  also  have  r 2  <  r\  or  r3  <  r\ .  This  two  conditions  together  with  the 
transitive  properties  of  <,  yield  r2  <  r\  and  r3  <  r\.  Note  that  in  this  partial 
order  r4  cannot  be  compared  to  any  of  the  other  rules.  In  general,  we  say  that  a 
proposition  0  is  conditionally  entailed  by  0  (in  the  context  of  a  set  A)  if  a  holds 
in  all  the  preferred  models  for  0  induced  by  every  priority  ordering  admissible  for 
A.  Conditional  entailment  rectifies  many  of  the  shortcommings  of  system-Z,  as 
well  as  some  weaknesses  of  the  entailment  relation  induced  by  maximum  entropy. 
However,  having  been  based  on  model  minimization  as  well  as  on  enumeration 
of  subsets  of  rules,  its  computational  complexity  might  be  overbearing.  A  proof 
theory  for  conditional  entailment  can  be  found  in  [36]. 

Boutilier  [15]  proposed  a  system  which  combines  the  priority  ordering  of 
system- 21  (i.e.  the  flat  version  of  system-Z+),  with  Brewka’s  [16]  notion  of  pre¬ 
ferred  subtheories.  Thus,  whereas  system-2+  assigns  equal  rank  to  any  two 
worlds  that  violate  a  rule  r  with  Z+(r)  =  z  and  no  rule  of  higher  Z+,  the  pro- 


31  Assuming  a  flat  version  where  all  <5’s  are  zero. 


posal  in  [15]  will  make  further  comparisons  in  terms  of  rules  of  lower  priority 
violated  in  these  worlds.  In  the  case  above,  since  any  minimal  world  satisfying 
p  A  /  must  violate  a  proper  subset  of  the  rules  violated  by  any  minimal  model  for 
pA~>l,  the  desired  conclusion  is  certified.  These  notion  are  formalized  in  terms  of 
the  modal  logic  CO*  which  is  semantically  related  to  the  probabilistic  interpre¬ 
tation  proposed  in  this  dissertation  [14].  Nevertheless,  counterintuitive  examples 
to  this  notion  of  entailment  can  be  found  in  [36,  48].  While  Boutilier’s  proposal 
appears  to  be  simpler  than  conditional  entailment  (as  it  does  not  require  partial 
orders),  its  computational  effectiveness  is  yet  to  be  analyzed. 
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CHAPTER  5 


Causality 


5.1  Introduction 

Independently  of  whether  causality  is  a  property  of  nature  or  a  conceptual  con¬ 
venience,  the  organization  of  knowledge  as  cause-effect  relations  is  fundamental 
for  tasks  of  prediction  and  explanation.  This  chapter  introduces,  within  the  basic 
framework  of  ranking  systems,  a  simple  mechanism  called  stratification  for  the 
representation  of  causal  relationships,  actions,  and  changes. 

The  lack  of  a  mechanism  for  distinguishing  causal  relationships  from  other 
kinds  of  associations  has  been  a  serious  deficiency  in  most  nonmonotonic  sys¬ 
tems  [96],  the  classical  illustration  of  which  is  given  by  the  now-famous  Yale 
Shooting  Problem.  (YSP)  [57].  In  its  simplified  version,  the  YSP  builds  the  ex¬ 
pectation  that  if  a  gun  is  loaded  at  time  t0  and  Fred  is  shot  with  the  gun  at  time 
ti,  Fred  should  be  dead  at  time  t2,  despite  the  normal  tendency  of  being  alive  to 
persist.  Many  formulations  —  including  circumscription  [88],  default  logic  [108], 
maximum  entropy  (Chap.  3),  system-Z+  (Chap.  4),  and  conditional  entailment 
[36]  —  do  not  yield  the  expected  conclusion.  Instead  they  reveal  an  alternative, 
perfectly  symmetrical  version  of  reality,  whereby  somehow  the  gun  got  unloaded 
and  Fred  is  alive  at  time  t2. 

The  inclination  to  choose  the  scenario  in  which  Fred  dies  is  grounded  in  no¬ 
tions  of  directionality  and  asymmetry  that  are  particular  to  causal  relationships. 
This  chapter  shows  that  these  notions  can  be  derived  from  one  fundamental  prin¬ 
ciple,  Markov  shielding ,  which  can  be  embodied  naturally  in  preferential  model 
semantics  using  the  device  of  stratified  rankings.  Informally,  the  principle  can  be 
stated  as  follows: 

«  Knowing  the  set  of  causes  for  a  given  effect  renders  the  effect  independent 
of  all  prior  events. 

In  the  YSP,  given  the  state  of  the  gun  at  time  t\,  the  effect  of  the  shooting  can 
be  predicted  with  total  disregard  for  the  gun’s  previous  history. 

This  chapter  proposes  a  probabilistically  motivated,  ranked-model  semantics 
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for  rules  of  the  form  “typically,  if  cause!  and  . . .  and  causen,  then  effect”,  which 
incorporates  the  above  principle  under  the  assumption  that  “causes”  precede  their 
“effects”.  As  a  by-product,  our  semantics  exhibits  another  feature  characteristic 
of  causal  organizations:  modularity.  Informally, 

•  Adding  rules  that  predict  future  events  cannot  invalidate  beliefs  concerning 
previous  events. 

This  is  analogous  to  a  phenomena  we  normally  associate  with  causal  mechanisms 
such  as  logical  gates  in  electrical  circuits,  where  connecting  the  inputs  of  a  new 
gate  to  an  existing  circuit  does  not  alter  the  circuit’s  behavior  [21], 

Although  several  remedies  were  proposed  for  the  YSP  within  conventional 
nonmonotonic  formalisms  [118,  36,  121,  8,  79],  the  formalism  explored  in  this 
chapter  seeks  to  uncover  remedies  systematically  from  basic  probabilistic  princi¬ 
ples  [97,  pp.  509-516].  Incorporating  such  principles  in  the  qualitative  context 
of  world  ranking  yields  useful  results  on  several  frontiers.  In  prediction  tasks 
(such  as  the  YSP),  our  formalism  prunes  the  undesirable  scenarios,  without  the 
strong  commitment  displayed  by  chronological  minimization  [118]  and  without 
the  addition  of  external  causal  operators  to  the  conditional  interpretation  of  the 
rules  [36]  (see  Section  5.3).  In  abduction  tasks  (such  as  when  Fred  is  seen  alive  at 
t2),  our  formalism  yields  plausible  explanations  for  the  facts  observed  (e.g.,  sim¬ 
ilar  to  [121],  the  gun  must  have  been  unloaded  sometime  before  the  shooting  at 
l  ] ) .  These  suggests  that  the  principle  of  Markov  shielding,  by  being  grounded  in 
probability  theory  (hence  in  empirical  reality),  can  provide  a  coherent  framework 
for  the  many  facets  of  causation  found  in  commonsense  reasoning.  Moreover, 
given  the  connection  formed  among  causation,  defaults,  and  probability,  we  can 
now  ask  not  merely  how  to  reason  with  a  given  set  of  causal  assertions  but  also 
whether  those  assertions  are  compatible  with  a  given  stream  of  observations.  A 
framework  for  explanations  is  further  discussed  in  Section  5.3.2. 

Section  5.3.1  defines  a  notion  of  consistency  in  the  context  of  causal  rules, 
and  briefly  compares  it  to  the  notion  of  p-consistency  introduced  in  Chapter  2. 
Section  5.4  demonstrates  how  rank-based  systems  can  embody  and  unify  the  the¬ 
ories  of  belief  revision  [3]  and  belief  updating  [65].  Whereas  belief  revision  deals 
with  new  information  obtained  through  new  observations  in  a  static  world,  belief 
update  deals  with  tracing  changes  in  an  evolving  world,  such  as  that  subjected 
to  the  external  influence  of  actions. 

As  shown  in  Section  4.6  system- Z+  offers  a  natural  embodiment  of  the  prin¬ 
ciples  of  belief  revision  as  formulated  by  Alchourron,  Gardenfors  and  Makinson 
(AGM)  [3],  with  the  additional  features  of  enabling  the  absorption  of  new  con- 
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ditional  sentences  and  the  verification  of  counterfactual  sentences  and  nested 
conditionals.  The  addition  of  stratification  to  system- iv+,  by  virtue  of  represent¬ 
ing  actions  and  causation,  also  provides  the  necessary  machinery  for  embodying 
belief  updates  consistent  with  the  principles  proposed  by  Katsuno  and  Mendelzon 
(KM)  [65], 

5.2  Stratified  Rankings 

Let  X  =  {xi, . . . ,  xn)  be  a  finite  set  of  atomic  propositions.  Let  ci, . . . ,  cm  and  e 
be  literals  over  the  elements  of  X.  A  rule  in  this  chapter  is  defined  as  the  default 
Ci  A  ...  A  Cm  — >  e,1  where  the  conjunction  “ci  A  ...  A  cTO”  is  called  the  antecedent 
of  the  rule  and  “e”  its  consequent.2 

Given  X  and  a  set  A  of  rules,  the  underlying  characteristic  graph  for  (X,  A), 
is  the  directed  graph  rpr.A)  such  that  there  is  a  node  u,-  for  each  Xi  E  X ,  and  there 
is  a  directed  edge  from  V{  to  vj  iff  there  is  a  rule  r  in  A  where  X{  (or  -<£,•)  is  part 
of  the  antecedent  of  r,  and  Xj  (or  ~'Xj)  is  the  consequent  of  R.  We  say  that  A  is 
a  causal  network  (or  network  for  short)  if  T^x,a)  acydic  (i.e.,  T^a)  is  a  DAG). 
If  vr, . . . , vs  are  the  parents  of  vt  in  L^a),  then  the  set  {xr,. . .  ,xs}  is  called  the 
parent  set  of  xt  and  the  set  {xr, . . .  ,xs}  U  {£<}  is  called  a  family.  Intuitively, 
the  parent  set  of  an  event  e  represents  all  the  known  causes  for  e.  A  network  A 
induces  a  strict  partial  order  on  the  elements  of  X  where  xt  -<  Xj  iff  there  is  a 
directed  path  from  vt  to  vj  in  T^a).  We  will  use  O(X)  to  denote  any  total  order 
on  the  elements  of  X  satisfying  -<.3  Intuitively,  -<  represents  a  natural  order  on 
events  where  causes  precede  their  effects.  As  an  example,  Figure  5.1  depicts  the 
underlying  graph  for  the  following  set  of  rules  to  be  used  in  Example  5.1: 

r-y\  tk  — +  cs  (“typically,  if  I  turn  the  ignition  key  the  car  starts”). 

r2:  tk  Abd  — >  ->c.$  (“typically,  if  I  turn  the  ignition  key  and  the  battery  is  dead, 
the  car  will  not  start”). 

1  For  simplicity  we  will  not  introduce  a  new  connective,  e.g.  — *■„,  and  we  will  only  consider 
flat  causal  rules.  Section  5.3.3  explores  the  use  of  variable  strength  rules  in  a  causal  context. 

2The  form  ci  A  . . .  A  cm  — ►  e  does  not  restrict  the  development  of  this  chapter  but  it  clarifies 
the  exposition.  A  causal  rule  may  take  on  the  general  form  a(ci, . . . ,  cm)  — ►  p(e\,...,en) 
where  a  and  (3  are  any  Boolean  formulae.  Any  a(ci,...,cm)  can  be  simulated  by  a  set 
of  simpler  rules,  each  containing  a  conjunction  of  atomic  antecedents.  Moreover,  any  rule 
a(ci, . . . ,  cm)  — » j3(e i , . . . ,  e„)  can  be  represented  by  the  following  set  of  rules:  a(cx  ,  .  .  .  ,  Cm  )  ¥ 
eL  /?(e i,...,e„)  =>  e',  and  — >/?(e i,...,en)  =>  — ie; ,  where  e!  is  a  dummy  variable  and  =$■  is  a 
strict  conditional.  The  role  of  strict  conditionals  in  a  causal  setting  is  introduced  in  Section  5.3. 

3Note  that,  in  particular,  any  ordering  O(X)  induced  by  a  topological  sort  on  the  nodes  of 
F (.VA) ;  where  x,  <  Xj  if  i \  precedes  Vj  in  the  topological  sort,  satisfies 


lo 
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Figure  5.1:  Underlying  graph  for  the  causal  rules  in  the  battery  example 

r3:  lo  — >  bd  (“typically,  if  I  leave  the  headlights  on  all  night  the  battery  is 
dead”). 

In  previous  chapters,  the  interpretation  of  a  rule  ip  — >  was  based  on  the 
condition  of  admissibility  of  k  (see  Def.  4.2).  A  ranking  k  is  admissible  relative 
to  A  iff  for  every  ipi  — >  ipi  G  A:4 

>  0  (5.1) 

We  now  extend  this  requirement  and  introduce  a  stratification  constraint  that 
will  endow  the  rules  with  a  causal  character. 

Definition  5.1  (Stratified  Rankings)  Given  a  network  A,  an  admissible  rank¬ 
ing  k  relative  to  A,  and  an  ordering  0(A);  let  Xx  (1  <  i  <  n)  denote  a  lit¬ 
eral  variable  taking  values  from  {xi , -la:,-},  and  let  Parx,  denote  the  conjunction 
XrA...AXs  where  {Xr, . . . ,  Xs}  is  the  parent  set  of  xt.  We  say  that  k  is  stratified 
for  A  under  0(A),  if  for  2  <  i  <  n,  and  for  any  instantiation  of  the  variables 
X\ , . . . ,  Xi,  we  have 

K(Xi \Xi-i  A  ...  A  AG)  =  4Xi\ParXi)  (5.2) 


□ 

Eq.  5.2  says  that  in  a  stratified  ranking  the  incremental  surprise  of  finding  X{ 
in  a  full  description  of  some  past  scenario,  must  be  equal  to  the  incremental 
surprise  of  finding  aq  given  just  the  state  of  Parx,  in  that  same  scenario.  Thus, 
the  parent  set  of  an  event  x,  ( ParXi )  shields  this  event  xt  from  all  prior  events 
(see  Fig.  5.2).  This  condition  parallels  the  Markovian  independence  conditions 

4 Assuming  all  <5;’s  are  equal  to  zero;  otherwise  admissibility  would  require  that  k (~'ipi\<Pi)  > 
6i  in  Eq.5.1. 
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Xi 

Figure  5.2:  Stratification  condition. 


embodied  in  Bayes  Networks  (BN)  [97]. 

A  BN  is  a  pair  (F,  P)  where  T  is  a  DAG  and  P  is  a  probability  distribution. 
Each  node  V{  in  F  corresponds  to  a  variable  Xi  in  P,  and  P  decomposes  into  the 
product: 

P(Xn,  ...,X1)  =  n  P(Xi\Parx,)  (5.3) 

«  =  1 

which,  similarly  to  Eq.  5.2,  incorporates  the  assumption  that  the  parent  set  of 
any  given  variable  Xi  renders  Xi  probabilistically  independent  of  all  its  prede¬ 
cessors  (in  the  given  ordering).  Causal  networks  can  in  fact  be  regarded  as  an 
order-of-magnitude  abstraction  of  BN’s,  where  exact  numerical  probabilities  are 
replaced  by  integer-valued  levels  of  surprise  (k),  addition  is  replaced  by  min, 
and  multiplication  is  replaced  by  addition  (see  [53,  120,  102]).  Eq.  5.2  can  be 
re-written  to  mirror  Eq.  5.3  as:5 

K(Xn  A  ...  A  AG)  =  £  K(Xi\ParXi)  (5.4) 

8  =  1 

Note  that  Eq.  5.4  also  constitutes  an  effective  test  for  checking  whether  a  given 
ranking  k  is  stratified  for  an  arbitrary  network  A.  The  test  can  be  made  recursive 
if  we  write  Eq.  5.4  as 

m 

n(Xm  A  . . .  A  XI)  =  J2  HXi\ ParxX  m=l,2,. . .  ,n  (5.5) 

i 

which  follows  from  Eq.  5.4  after  marginalizing 6  over  {Xn, . . . ,  Xm  +  1},  rn  = 
1,2,  ...,n.  We  shall  show  that  the  requirement  of  stratification  augments  ad- 

5An  even  coarser  abstraction  of  Eq.  5.3  in  the  context  of  relational  databases  can  be  found 
in  [21],  where  the  stratification  condition  is  imposed  on  relations  and  then  used  in  finding 
backtrack  free  solutions  for  constraint  satisfaction  problems. 

6In  probability  theory,  we  marginalize  over  {X„, . . . ,  Xm+i}  by  summing  over  all  instan¬ 
tiations  for  these  variables;  thus,  we  have  P(Xm, . . .  ,X  i)  =  J2xn  X,„  ,  P(Xn,  ■  ■  ■  ,X  i).  It 
follows  from  Eqs.  4. 1-4.3  that  n(Xm  A  ...  A  Xi)  —  Ylxn,...,Xm+l  K(A«  A  ...  A  Aj). 


missible  rankings  with  the  properties  of  Markov  shielding  and  modularity  (see 
Theorems  5.6  and  5.7  below),  that  we  normally  attribute  to  causal  organizations. 

The  following  theorem  states  that  the  stratification  criterion  (Eq.  5.2)  does 
not  depend  on  the  specific  ordering  0(X).  This  implies  that  in  order  to  test 
whether  a  given  ranking  n  is  stratified  relative  to  a  network  A,  it  is  enough  to 
test  Eq.  5.2  against  any  ordering  O(X). 

Theorem  5.2  Given  a  network  A  ,  let  Oi(X)  and  02(X)  be  two  orderings  of  the 
elements  in  X  according  to  A.  If  k  is  stratified  for  A  under  0\(X),  then  k  is 
stratified  for  A  under  02(X). 

To  illustrate  the  nature  of  stratification,  we  will  compare  two  admissible  rank¬ 
ings  associate  with  the  network  A  =  {a  —>  ->c,  b  —+  c}.  A  stratified  ranking  for 
A  is  depicted  on  the  left-hand  side  of  Table  5.1  (ks),  while  the  ranking  on  the 
right-hand  side  represents  the  k+  (system-Z+)  ranking  for  A.7  In  order  to  show 


K 

ks:  Stratified 

k+:  System- 

0 

(-> a,  b,  c),  (-ia,  ->&,  e),(->a,  -i b,  ->c) 

( -i a,  b,  c),  (-if j ,  -i b,  c),(->a,  — >6,  ~>c),  (a,  — >6,  -i c) 

1 

(a,  ->&,  -ic),  (o,6,->c),(-ia,6,-ic) 

(a,6,c),  (a,6, -ic),(-io,6,-ic),  (a,->6,c) 

2 

(a,b,c),  (a,  — <6,  c) 

no  worlds  in  this  rank 

Table  5.1:  Stratified,  «*,  and  k+  rankings  for  {a  — »  ~<c,b  — »  c}. 

that  k'+  is  not  stratified  we  select  the  order  O  =  (A,B,C)  (which  agrees  with 
the  characteristic  DAG  of  A)  and  test  whether  /c+(->c  A  b  A  a)  satisfies  Eq.  5.4. 
From  Table  5.1  k+(-<cA  a  A  b)  =  1,  K+(-'c\a  A  b)  =  K.+  (a )  =  /t+(6)  =  0,  and 
therefore  k+(-’C  A  g  A  6)  /  K+(-'c|a  A  b)  +  /c+(a)  +  K+(b )  contrary  to  the  require¬ 
ments  of  Eq.  5.4.  Alternatively,  we  can  use  the  Markov  shielding  property  (to  be 
proven  in  Thm.  5.6)  according  to  which  the  parents  of  every  variable  render  that 
variable  independent  of  all  its  other  predecessors.  Since  B  is  a  root  node  in  the 
characteristic  DAG  of  A,  it  has  no  parents,  and  it  must  therefore  be  (marginally) 
independent  of  all  its  predecessors,  namely  of  A.  In  terms  of  ranking  functions 
this  requirement  of  independence  translates  into 

k(A  A  B)  =  k(A)  +  k(B)  (5.6) 

7The  maximum  entropy  ranking  k*  for  this  network  A  is  identical  to  n+ . 
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for  all  instantiations  of  the  literals  A  and  B  (taking  values  from  {a,~ia}  and 
{ b ,  ->6}  respectively).  As  can  be  verified  from  Table  5.1,  ks  complies  with  Eq.  5. 6. 8 

5.3  c-Entaiiment 

Given  a  network  A  each  stratified  ranking  k  defines  a  consequence  relation 
where  <j>  ||~  a  iff  k(g  A  <j>)  <  k (~>a  A  d>)  or  if  /c(0)  =  oo.  A  consequence  relation  is 
said  to  be  proper  for  <f>  |(^  a  iff  k,(0)  ^  oo. 

Definition  5.3  (c-Entailment)  A  network  A  c-entails  a  given  <f>,  written  0  cr, 
iff  (f>  11^  a  in  every  k  stratified  for  A,  which  is  proper  for  <f>  |f^  a. 

□ 

In  other  words,  given  A,  we  can  expect  a  from  the  evidence  0,  iff  the  preference 
constraint  conveyed  by  0  — >  a  is  satisfied  by  every  stratified  ranking  for  A. 
Def.  5.3  parallels  the  definition  of  (probabilistic  entailment,  Def.  3.9)  with 
the  only  difference  being  that  the  rankings  for  |{^  must  be  stratified.  We  remark 
that  c-entailment  is  not  to  be  interpreted  as  stating  that  0  is  believed  to  cause 
a.  Rather,  it  expresses  an  expectation  to  find  a  true  in  the  context  of  0,  having 
given  a  causal  character  to  the  rules  in  A. 

Since  the  set  of  stratified  rankings  for  a  given  A  is  a  subset  of  the  admissible 
rankings  for  A,  every  stratified  consequence  relation  must  satisfy  the  rules  of 
inference  of  Logic,  Cummulativity  and  Cases  introduced  in  Section  3.2.  It  follows 
then,  that  the  rules  of  inference  below  are  sound  for  c-entailment. 

Theorem  5.4  Let  (/>,0, 7,  <7,  and  their  conjunction  be  satisjiable  wffs.  The  fol¬ 
lowing  are  sound  rules  of  inference  for  |j^; 

1.  (Defaults)  If  p  — *  0  (or  p  =7  0J  6  A  then  p  |}~  0. 

2.  (Logic)  If  |=  D  0  then  ip  |fx  0- 

3.  (Augmentation)  If  ip  |j^  0  and  p  7  then  p  A  7  |{^  0. 

4.  (Cut)  Ifp  |f-  7  and  p  A  7  |{-  0  then  p  |f~  0. 

5.  (Cases)  If  p  0  and  7  0  ^ ien  ^  V  7  0. 

The  first  rule  ( Default )  follows  immediately  from  the  requirement  of  admissibility. 
Rules  2  and  5  correspond  to  the  rules  of  Logic  and  Cases  of  Section  3.2,  and  rules  3 

sEq.  5.6  can  be  also  obtained  from  5.5  by  setting  m  =  2. 
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and  4  simple  rewrite  the  Cumulativity  rule  of  Section  3.2  by  breaking  the  iff  into 
two  cases. 

The  following  are  derived  rules  of  inference  and  further  illustrate  the  logical 
properties  of  c-entailment  and  will  be  used  in  Examples.  5.1  and  5. 2. 9 

Theorem  5.5  Derived  rules  of  inference: 

1.  (Deductive  closure)  If  p  |j^  if  and  p  |{^  7  and  \=pAifA^2)cr  then 

•A 

2.  (Presuppositions)  If  p  |{^  if  and  <p  A  7  |{^  -'if  then  p  |{^  -17. 

3.  (And)  If  ip  ||^  if  and  p  |f^  7  then  p  |j-^  if  A  7. 

These  rules  of  inference  are  also  sound  with  respect  to  p-entailment  (and 
probabilistic  entailment)  and,  therefore,  as  discussed  in  previous  chapters,  are 
too  weak  to  constitute  a  full  account  of  plausible  reasoning.  The  next  two  the¬ 
orems  provide  additional  inference  power  (reflecting  the  stratification  condition) 
which  emanates  from  the  causal  structure  of  A.  They  establish  conditions  under 
which  these  inference  rules  can  be  applied  modularly  to  subsets  A'c  A  with  the 
guarantee  that  the  resulting  inferences  will  hold  in  A. 

Theorem  5.6  Let  A  be  a  network,  and  let  {pr, . . .  ,ps}  be  a  set  of  literals  cor¬ 
responding  to  the  parent  set  {xr, . . . ,  £7}  of  xt  (each  pi,  r  <  i  <  s,  is  either  X{ 
or  -'Xi).  Let  eXt  denote  a  literal  built  on  xt,  and  let  y  =  {j/i, . . . ,  ym]  be  a  set 
of  atomic  propositions  such  that  no  iji  (E  y  is  a  descendant  of  xt  in  T^a).  Let 
<fy  be  any  wjj  built  only  with  elements  from  y  such  that  fy  A  pr  A  ...  A  ps  is 
satisfiable.  If  pr  A  . . .  A  ps  |{^  eXt  then  <fy  A  pr  A  ...  A  ps  |j^  eXt . 

Theorem  5.7  Let  X 1  C  X  and  A'C  A  such  that  all  rules  in  A'  are  built  with 
atomic  propositions  in  X' ,  and  if  x'  £  X'  then  all  the  rules  in  A  with  either  x' 
or  ->;r'  as  their  consequent  are  also  in  A1.  Let  p  and  if  be  two  wffs  built  with 
elements  from  X' .  If  p  then  p  |j^  if. 

These  theorems  confirm  that  stratified  rankings  exhibit  the  properties  of  Markov 
shielding  and  modularity.  As  a  corollary  to  Theorem  5.7  it  is  easy  to  see  that 
c-entailment  is  insensitive  to  irrelevant  propositions,  and  moreover,  given  two  net¬ 
works  with  no  causal  interaction,  their  respective  sets  of  plausible  conclusions  will 

9They  are  taken  from  [36]  where  a  formal  derivation  in  terms  of  the  rules  of  inference  in 
Theorem  5.4  can  be  found. 
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be  independent  of  each  other.  To  obtain  a  complete  proof  theory  for  c-entailment 
the  four  axioms  of  graphoids  [97,  Chapter  3]  need  to  be  invoked.10  Theorems  5.6 
and  5.7  cover  the  essence  of  these  axioms  and  are  sufficiently  powerful  to  illustrate 
the  main  features  of  c-entailment.  Consider  the  following  example:11 

Example  5.1  (Dead  battery)  The  network  A  =  {tk  — ►  cs,tkAbd  — >  ->cs,  lo  — > 
bd }  encodes  the  information  that  “typically  if  I  turn  the  ignition  key  the  car 
starts”,  “typically  if  I  turn  the  ignition  key  and  the  battery  is  dead  the  car  will 
not  start”,  and  “typically  if  I  leave  the  head  lights  on  all  night  the  battery  is 
dead”.  The  underlying  graph  for  this  network  is  depicted  in  Figure  5.1.  Given 
A,  and  the  fact  the  we  left  the  head  lights  on  all  night,  we  don’t  expect  the  car 
engine  to  start  once  we  turn  the  ignition  key  (i.e. ,  lo  A  tk  |j^  ->cs).  As  in  the  case 
of  YSP,  an  unintended  scenario  exists,  in  which  the  car  engine  actually  starts  and 
the  battery  is  not  dead  after  all.  In  both  maximum  entropy  and  system-Z+,  for 
example,  K+(loAtkAcs )  =  K+(loAtkA~^cs)  and  n*(loAtkAcs )  =  K*(loAtkA~^cs), 
and  consequently  neither  lo  A  tk  |~  ~'cs  nor  lo  A  tk  fy  ->cs.  The  reason  for  this 
behavior  is  that  the  k(<o)  in  these  approaches  depends  on  the  priorities  of  rules 
violated  in  to  and  the  priorities  assigned  to  rules  do  not  properly  reflect  their 
relative  position  in  the  causal  structure.  Given  that  the  key  is  turned  and  the 
lights  were  left  on,  we  know  that  either  the  rule  tk  — >  cs  or  the  rule  lo  — >  bd  must 
be  violated.12  In  both  these  approaches,  these  rules  receive  the  same  priority, 
and  therefore  the  unintended  scenario  is  as  normal  as  the  intended  one.13 

Table  5.2  contains  an  example  of  a  stratified  ranking  for  A,  showing  the 
inequality  tz(lo Atk  A~>cs  <  K.(loAtk  Acs).  Note  that  the  surprise  k  =  1  associated 
with  the  world  lo  =  loAbdAtkA-^cs  is  not  caused  by  any  rule  violation,  but  rather, 
by  the  abnormality  of  event  lo  (as  well  as  bd),  whose  k  is  indeed  1.  Although  the 
rule  tk  — >  cs  is  violated  in  lo,  it  does  not  contribute  any  additional  surprise  to 
k(io)  over  and  above  /c(/o).  Note  also  that  the  abnormality  of  the  event  lo  was 
not  explicitly  indicated  by  the  rule  author.  Rather,  it  is  was  deduced  from  the 
stratified  structure  of  A  which  must  render  lo  and  tk  independent,  hence,  if  lo  is 
abnormal  when  tk  is  true  (because  one  of  the  two  rules  must  be  violated)  it  must 

10The  conditional  independence  defined  by  k(Xz\X-2, Xi)  =  /RAsl-X^)  is  clearly  a  graphoid 
since  k  represents  infinitesimal  probabilities  (See  [120,  62]). 

11  This  example  is  isomorphic  to  the  YSP  [36]. 

12A  third  possibility  is  that  tk  A  bd  — ►  ->cs  is  violated;  but  since  this  is  a  more  specific  rule 
than  tk  — *■  cs  its  ^-priority  will  be  higher  (in  both  maximum  entropy  and  system-Z+,  and 
therefore  no  minimal  model  for  either  lo  Atk  A  cs  or  lo  Atk  A  ->cs  will  violate  this  rule. 

13We  could  force  the  desired  conclusion  by  setting  the  strengths  6  of  the  rules  to  the  appropri¬ 
ate  values.  This,  however,  would  require  advanced  knowledge  of  all  the  rules  in  the  knowledge 
base  (arid  their  interactions).  The  objective  in  this  chapter  is  a  formalism  able  to  extract  the 
necessary  information  automatically. 
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also  be  abnormal  when  tk  is  false  -  refraining  from  turning  the  key  cannot  make  us 
believe  that  the  lights  were  left  on  or  that  the  battery  is  dead.  This  ability  to  infer 
that  lo  (as  well  as  bd )  is  an  abnormal  eventuality,  a  rather  compelling  inference 
intuitively,  is  what  distinguishes  stratified  ranking  from  maximum  entropy  and 
system-Z+.14  Proposition  5.8  presents  a  formal  derivation  of  lo  A  tk  |{~  ~'cs: 


K 

worlds 

0 

(-i lo,  -i bd,  tk,  cs),  (-ilo,  -i bd ,  tk ,  -> cs ) 

1 

(lo,  bd,  tk,  ->cs),  (lo,  bd,  ->tk,  ->cs), 

(~'lo,  bd,  tk,  ~'cs),  (—'lo,  bd,  -'tk,  ->cs) 

2 

(lo,  -'bd,  tk,  cs),  (lo,  ~'bd,  ~>tk,  - >cs ) 

3 

Rest  of  the  ads 

Table  5.2:  Stratified  ranking  for  {tk  — »  cs,  tk  A  bd  — >  ~<cs,  lo  — *  bd}. 

Proposition  5.8  lo  A  tk  |f^  -<cs 

Proof:  Let  X'  =  {lo,  bd.,tk}  and  let  A'  =  {lo  — >  bd}. 

1.  lo  \\^,bd  ;  by  the  Defaults  rule. 

2.  tk  A  lo  |f^( bd  ;  by  1  and  Theorem  5.6. 

3.  tk  A  lo  |{^  bd  ;  by  2  and  Theorem  5.7. 

4.  tk  A  bd  |f~  — >cs  ;  by  the  Defaults  rule. 

5.  tk  A  bd  A  lo  |J^  -ics  ;  by  4  and  Theorem  5.6. 

6.  tk  A  lo  ~>cs  ;  by  3,  5  and  the  Cut  rule. 


The  key  intermediate  steps  in  this  derivation  rely  on  Theorems  5.6  and  5.7, 
which  embody  the  principles  of  markov  shielding  and  modularity: 

14If  the  rule  True  — *•  -'bd  is  added  to  A,  system- ■Z’"1’  would  yield  the  expected  conclusion. 
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•  tk  A  lo  bd.  This  follows  from  the  proposition  tk  and  applying  Theo¬ 
rem  5.7  to  the  sub-network  A'  built  from  the  language  X'  —  { lo.bd.tk }  and 
containing  only  the  rule  lo  — »  bd. 

•  tk  Abd  |j^  -ics  and  t k  A  bd  A  lo  |j^  ->cs.  The  former  follows  directly  from  p- 
entailment,  and  the  latter  from  applying  Theorem  5.6  to  the  rule  tk  Abd  — > 
- 'C.s ,  and  the  proposition  lo. 

The  next  example  presents  a  simple  abduction  (or  backward  projection)  prob¬ 
lem,  and  permits  us  to  compare  the  behavior  of  c-entailment  with  that  of  chrono¬ 
logical  minimization  [118]. 

Example  5.2  (Unloading  the  gun.)  Consider 

A  =  {/o—+  h,h  —■ ►  hi  ■  ■  •  5  In- 1  ~ *  In} 

standing  for  the  various  instances  of  “typically,  if  a  gun  is  loaded  at  time  ti ,  then 
it  is  expected  to  remain  loaded  at  time  i,+i”  (0  <  i  <  n).  We  say  that  a  rule 
li  — »  ll+\  is  falsified  by  u>  iff  u  j=  /,•  A  a  stratified  ranking  k  relative  to  A 

can  be  constructed  as  follows: 

k(oj)  =  number  of  rules  in  A  falsified  by  c o  (5-7) 

Given  that  the  gun  is  loaded  at  t0  and  that  it  is  found  unloaded  at  time 
tn  (i.e.,  l0  A  -i/n  is  true),  the  scheme  of  chronological  minimization  will  favor  the 
somewhat  counterintuitive  inference  that  the  gun  remained  loaded  until  tn-i  (he., 
h  A  ...  A  ln- 1  is  true),  c-entailment  on  the  other  hand,  only  yields  the  weaker 
conclusion  that  the  gun  must  have  been  unloaded  any  time  within  tx  and  fn_i 
(i.e.,  -'(ij  A  ...  A  /„)),  but  the  exact  instant  where  the  “unloading”  of  the  gun 
occurs  remains  uncertain. 

Proposition  5.9  l0  A  ~<ln  |fv  -i(/:  A  ...  A  ln) 

Proof:  Follows  trivially  from  the  Deduction  rule.  The  fact  that  we  cannot  point 
out  the  exact  moment  in  which  the  gun  is  unloaded  follows  from  the  ranking  built 
by  Eq.  5.7,  since  all  formulas  representing  these  situations  have  equal  ranking.  □ 

c-entailment  and  chronological  minimization  are  expected  to  yield  the  same 
conclusions  in  problems  of  pure  prediction,  since  enforcing  ignorance  of  future 
events  is  paramount  to  the  principle  of  modularity,  which  was  shown  to  be  inher¬ 
ent  to  c-entailment.  They  differ  however  in  tasks  of  abduction,  a.s  demonstrated 
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in  Example  5.2.  In  this  respect,  c-entailment  is  closer  to  both  motivated  ac¬ 
tion  theory  [121]  and  causal  entailment  [36].  However,  contrary  to  the  motivated 
action  theory,  c-entailment  automatically  enforces  specificity-based  preferences, 
which  are  natural  consequences  of  the  conditional  interpretation  of  rules.15 

We  end  this  section  by  discussing  the  strict  version  of  a  causal  rule  denoted 
by  =>,  which  will  be  useful  in  representing  non- defeasible  causal  influences  in 
Sections  5.3.2  and  5.4.  Semantically,  strict  rules  impose  the  following  constraints 
on  the  admissibility  conditions  of  a  ranking  k  (Eq.  5.1):  for  each  (p  =>  ip  in  the 
knowledge  base, 


A  <p)  <  K(^tp  A  tp)  =  oo,  and  a{<p)  <  co.  (5-8) 

Intuitivefy,  a  strict  conditional  voids  interpretations  that  render  its  antecedent 
true  and  its  consequent  false  by  assigning  them  the  lowest  possible  preference;  a 
rank  k  equal  to  infinity.16  The  following  are  two  properties  of  strict  rules: 

Proposition  5.10  Let  C\  A  . . .  A  Cn  =>  E  G  A 

1.  (Contraposition)  If  there  exists  a  stratified  ranking  for  A  where  k(~iE)  < 
oo  then  -<E  ||^  -■(C'i  A  ...  A  Cn) 

2.  (Transitivity)  If  g>\\^  if  and  f  |=  ( C\  A  ...  A  Cn)  then  <p  |{~  E 

These  properties  mirror  the  behavior  of  the  material  implication  UD”,  but  the 
resemblance  is  in  fact  only  superficial.  As  discussed  in  Sections  2.1  and  2.7,  the 
semantic  difference  between  a  strict  rule  c  e  and  the  wff  c  D  e  is  that  the  for¬ 
mer  expresses  necessary  hence  permanent  constraints  while  the  latter  expresses 
information  bound  to  the  current  situation.  Thus,  the  former  participates  in 
constraining  the  admissible  rankings  while  the  latter  is  treated  as  an  “observa¬ 
tion”  formula  ->cV  e,  and  can  affect  conclusions  only  by  entering  the  antecedents 
of  queries.  This  difference  is  greatly  accentuated  when  strict  conditionals  are 
treated  as  causal  rules,  because  stratified  rankings  are  more  sensitive  to  the  rule 
format.  Indeed,  contraposing  the  rule  a  =>  b  into  ->6  =£■  ~<a  changes  the  causal 
relationship  between  a  and  b  and  this  change  should  reflect  on  the  resulting  rank. 
Compare,  for  example,  Aj  =  {e  — >  -i 6 ,  a  =>  b)  and  A 2  =  c  — >  ->&,  ->b  =>  -'a.  Any 
stratified  ranking  for  Ax  must  render  a  and  c  totally  independent  of  each  other, 
as  two  unrelated  causes  of  the  variable  B. 

15We  remark  that  the  formalism  in  [121]  deals  with  a  much  richer  time  ontology  than  the 
formalism  presented  here,  and  with  a  first-order  language. 

16This  is  equivalent  to  requiring  that  P(il’\ip)  =  1  and  P(<p)  >  0  (see  2.2). 
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5.3.1  c-Consistency 


Chapter  2  proposed  a  norm  of  consistency,  called  p-consistency  for  rules  convey¬ 
ing  prototypical  information.  This  norm  and  its  associated  decision  procedure 
(Sec.  2.4)  were  shown  to  be  sufficient  when  rules  were  augmented  with  degrees 
of  strength  (Thm.  4.3).  The  semantical  requirements  of  stratification  induce  a 
new  notion  of  consistency,  specific  to  the  causal  interpretation  of  rules,  which  is 
radically  different  from  p-consistency. 

Definition  5.11  (c-Consistency)  A  network  A  is  c-consistent  iff  there  exists 
at  least  one  stratified  ranking  k  for  A. 

□ 

An  example  of  a  c-inconsistent  network  is  A  =  {tk  — *■  cs,tk  A  bd  — »  -> cs,tk  —> 
x,x  — >  bd}.1'  To  show  that  A  is  inconsistent  note  that  by  Presupposition 
(Thm.  5.5)  we  have  tk  |{^  -'bd,  which  implies  that  in  all  stratified  rankings  K,{~'bd/\ 
tk)  <  aibdAtk).  A  simple  application  of  Theorem  5.6  on  x  — +  bd  (and  the  propo¬ 
sition  tk)  yields  tk  A  x  |{^  bd.  By  the  Defaults  rule  tk  ||^  x  which  together  with 
tk  A  x  bd  and  the  Cut  rule  yields  tk  ||^  bd,  which  in  turn  implies  the  contra¬ 
dictory  inequality  K,(bd  A  tk)  <  K(->bd  A  tk).  The  lack  of  an  appropriate  causal 
interpretation  for  this  set  of  rules  is  not  surprising.  If  we  accept  that  tk  causes 
cs ,  we  should  expect  ->bd  to  hold  by  default  when  tk  is  true.  On  the  other  hand, 
if  there  is  a  causal  path  from  tk  to  bd,  we  should  expect  bd  to  hold  in  the  context 
of  tk.  Note  that  this  set  is  p-consistent. 

We  can  find  an  admissible  but  not  stratified  ranking  for  A  (see  Table  5. 3). 18 
This  ranking  depicts  a  situation  in  which  the  act  of  predicting  the  consequences 
of  turning  the  key  seems  to  protect  the  battery  against  the  damage  inflicted  by 
x ,  and  such  a  flow  of  events  is  indeed  contrary  to  the  common  understanding  of 
causation.  In  fact,  if  we  do  not  ascribe  a  causal  character  to  the  rules,  we  cannot 
apply  Theorem  5.6  and  thus  tk  ||~^  bd  is  not  in  the  consequence  relation  of  all 
admissible  rankings. 

Another  c-inconsistent  set  is  A  =  {a  =>  c,  b  =>  — <c} ,  which  might  arise  when 
we  physically  connect  the  outputs  of  two  logic  gates  with  conflicting  functions. 
Since  neither  a  nor  b  have  parents  in  T^a),  every  stratified  ranking  (for  A)  must 
yield 


k (a  A  b)  —  K(a)  +  «(&),  (5.9) 

wT 

his  is  the  network  used  in  Example  5.1  augmented  with  the  two  rules  tk  — >  x  and  x  —+  bd. 
18This  ranking  is  not  stratified  for  A  since  n(bdAx  Atk)  =  2,  but  n{bd\x)-\- n{x\tk)  + K.(tkf  =  1, 
which  contradicts  Eqs.  5.2  and  5.4. 
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K 

worlds 

0 

(-'tk,  x,bd,  -i cs) 

l 

( tk,x ,  -i bd,  cs) 

2 

(tk,  x,  bd,  -i cs) 

3 

Rest  of  the  As 

Table  5.3:  Admissible  ranking  for  {tk  — >  cs,tk  A  bd  — >  ->cs,tk  — >  x,x  — *  6d}. 

implying  that  a  and  6  are  independent  events.  However,  if  each  time  we  observe 
a  we  should  expect  c  and,  each  time  we  observe  b  we  should  expect  ->c,  then  a 
and  b  must  be  mutually  exclusive,  hence  negatively  correlated  events.  Indeed, 
since  k  (a  A  6  Ac)  =  n(a  A  b  A  ->c)  =  oo,  we  have  k(cz  A  6)  =  oo,  and  Eq.  5.9  cannot 
be  satisfied  unless  either  a  or  b  is  permanently  false,  thus  defying  the  “possible 
antecedent”  requirement  for  strict  rules  (Eq.  5.8).  Note  that  this  A  is  again 
p-consistent  since,  if  it  were  not  for  the  requirement  of  Eq.  5.9,  an  admissible 
ranking  can  be  constructed  by  simply  excluding  (by  setting  k  —  oo)  any  u>  such 
that  u)  [=  a  A  b,  which  would  still  permit  us  to  assign  K,(a)  =  tt(b )  <  oo. 

5.3.2  Accountability:  A  Framework  For  Explanations 

Causality  is  a  worthy  abstraction  of  complex  interactions  in  as  much  as  it  proves 
itself  useful  for  formulating  predictions  and  explanations  for  modeling  these  in¬ 
teractions.  The  bulk  of  the  effort  in  previous  sections  was  spent  in  incorporating 
into  ranking  representations  properties  associated  with  causality  and  showing 
how  these  properties  can  be  used  to  facilitate  prediction.  In  this  section  we 
concentrate  on  producing  plausible  explanations  for  a  given  set  of  observations. 

For  example,  once  we  are  told  that  “turning  the  ignition  key  causes  the  car 
engine  to  start”  we  would  like  to  explain  a  car-engine  running  by  conjecturing 
that  somebody  must  have  turned  the  ignition  key.  However,  cs  |j^  tk  is  not  a 
c-entailed  conclusion  from  the  network  A  =  {tk  — >  cs}.  The  problem  is  that  we 
haven’t  provided  any  information  in  A  that  establishes  the  starting  of  the  car  as 
a  phenomena  that  in  itself  needs  to  be  explained.  The  rule  tk  — >  cs  only  imposes 
two  constraints  on  the  rankings  of  possible  worlds:  First,  cs  should  hold  in  all 
the  most  preferred  models  for  tk  and,  second,  once  tk  is  known  to  be  true  we  can 
expect  cs  to  hold  independently  of  any  event  prior  to  tk.  But  this  says  little  about 
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the  models  of  ~^cs  and,  in  particular,  whether  the  car  will  start  without  turning 
the  ignition  key.  In  normal  discourse,  when  given  a  set  of  rules  “C,-  causes  E”  we 
usually  subscribe  (by  convention)  to  additional  assumptions  that  help  complete 
this  information.19  Three  of  the  most  common  assumptions  are: 

•  Accountability:  An  effect  E  is  presumed  false  if  all  the  conditions  listed 
as  causes  of  E  are  also  false.20 

•  Exception  Independence:  The  rules  representing  the  “cause-effect”  re¬ 
lations  may  admit  exceptions  which  inhibit  the  occurrence  of  the  effect  even 
in  prescence  of  the  cause.  However,  unless  explicitly  stated  or  logically  im¬ 
plied,  these  exceptions  are  presumed  independent.  In  the  car  example,  a 
dead-battery  and  an  empty  gas  tank  can  be  considered  as  such  exceptions  to 
tk  — >  cs.  Both  will  prevent  the  car  engine  from  starting,  and  are  presumed 
to  be  independent  of  each  other. 

•  Disjunctive  Interaction:  The  likelihood  of  an  event  does  not  diminish 
when  several  of  its  causes  prevail  simultaneously.  For  example,  if  rain  and 
sprinkler-on  are  each  a  cause  for  the  grass  being  wet,  the  grass  will  be  only 
more  likely  to  be  wet  if  both  the  sprinkler  is  turned  on  and  it  is  raining.21 

A  probabilistic  model  that  captures  these  assumptions,  called  noisy-or  gate ,  is 
described  in  [97]  where  it  is  proposed  as  a  canonical  model  of  disjunctive  interac¬ 
tion  among  causes  C\, ,  Cn  that  predict  the  same  effect  E.  The  noisy-or  gate 
is  depicted  schematically  in  Fig.  5.3.  The  set  {A, . . .  represents  inhibitors, 
where  each  /,•  stands  for  an  abnormality  that  would  interfere  with  the  causal 
connection  between  Ci  and  E.  Every  pair  Ci  and  /,•  constitutes  the  inputs  to  an 
and-gate  so  that  if  Ci  is  “active”  (or  true)  and  /,  is  not  known  to  be  active,  then 
the  output  Si  will  provide  support  for  the  effect  E.  Each  ,s;  is  then  an  input  to 
the  final  or-gate.  If  one  or  more  of  these  sfs  is  active  then  E  is  expected  to  be 
true,  and  if  all  are  false,  then  E  is  expected  to  be  false. 

Both  and-gates  and  or-gates  impose  functional  constraints  on  propositions; 
thus,  in  order  to  represent  their  behavior  strict  rules  are  necessary.  The  rules  in 
Eqs.  5.10-5.13  formalize  the  intended  behavior  of  an  and-gate: 

Ci  A  -i /,  Si  (5.10) 

19See  [68]  for  a  discussion  on  the  relation  between  completing  the  information  and  abduction 
reasoning  for  tasks  of  producing  explanations  from  observations. 

20This  may  require  that  we  lump  together  all  unknown  causes  of  E  under  the  heading  “all 
other  causes” . 

21This  assumption  actually  follows  from  that  of  exception  independence. 
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E 


Figure  5.3'.  The  noisy-or  interaction  model:  Ci, . . . ,  Cn  are  the  set  of  causes  for 
E,  and  each  7t  represents  an  inhibitor  or  abnormality  for  Ci  — »  E 

“If  Ci  is  true  and  7t  is  not  active,  then  there  is  support  Si  for  71”, 22 

~‘C{  =>■  -'Si  (5.11) 

“if  the  cause  Ct  is  not  active  there  is  no  support  ,s.,  for  EC 

E  =>■  -'Si.  (5.12) 

“If  Ii  is  active  there  is  no  support  .s  ■  for  EC 

True  ->  (5.13) 

“/,•  is  an  abnormality,  so  it  is  false  by  default”. 

The  or-gate  represents  the  interaction  between  the  set  of  causal  rules  for  the 
effect  E ,  with  propositions  ,s1,...,sn  as  inputs  and  the  literal  constant  E  as 
output.  The  behavior  of  this  gate  is  governed  by  a  pair  of  strict  rules:  23 

si  V  . . .  V  sn  =$■  E,  (5.14) 

and  a  closure  rule 

->(si  V  . . .  V  sn)  =>•  ~'E  (5.15) 

This  last  rule  incorporates  the  assumption  of  accountability:  If  there  is  no  causal 
support  for  E,  then  E  must  be  false.  Strict  rules  are  necessary  to  simulate 
the  disjunctive  nature  of  the  or-gate  since  s  A  s'  |{^  e  is  not  c-entailed  from  A  = 
{s  V  s'  — >  e}. 

22This  rule  is  reminiscent  of  the  proposed  encoding  of  defaults  under  circumscription  using 
the  ab  predicate  suggested  by  McCarthy  [88]. 

23Note  that  we  could  have  equivalently  encoded  Eq.  5.14  as  a  set  of  rules  s,  =>  E,  1  <  i  <  n, 
since  n  applications  of  the  Disjunction  rule  of  inference  (Thm  5.4)  on  Sj  =>  E  will  in  fact  yield 
si  V  . . .  V  sn  =>  E. 
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tk  et  bd 


cs 

Figure  5.4:  A  Schematic  view  of  the  car  example. 

Proposition  5.12  Given  a  network  A  such  that  s\  V  . . .  V  sn  E  €  A,  then 
(A }=j  Si)  If*  E,  where  1  <  j  <  k  <  n. 

Thus,  given  a  set  of  causal  relations  “C;  causes  E” ,  1  <  i  <  n,  we  can  use  the 

rules  in  Eqs.  5.14  and  5.15  for  modeling  the  or-gate  and  the  rules  in  Eqs.5.10-5.13 

for  modeling  the  and-gates. 

In  many  cases  we  have  explicit  knowledge  of  the  identity  of  the  mechanisms 
capable  of  inhibiting  the  normal  causal  connection  between  Ct-  and  E.  Their 
interactions  can  be  modeled  in  the  same  fashion,  using  and-gates  and  or-gates 
as  building  blocks.  For  example,  given  that  “turning  the  key  (tk)  causes  the  car 
engine  to  start  (cs)”  and  two  mechanisms  that  might  inhibit  this  relation,  namely 
a  dead  battery  ( db )  and  an  empty  gas  tank  (et),  we  would  require  a  noisy-or  for 
the  tk  — +  cs  causal  relation:24 

tk  A  ->Itk  =>■  cs  ;  True  — >  ->Itk  (5.16) 

Itk  =>  -‘cs  ;  -‘tk  =>  —‘cs  (5. 17) 

and  another  noisy-or  to  model  the  interaction  between  the  two  causes  bd  and  et 
for  the  inhibitor  Itk-  Lets  assume,  for  simplicity,  that  these  causal  relations  are 
strict  and  void  of  any  inhibitors  themselves.  Thus,  we  can  simplify  this  noisy-or 
to  a  standard  or-gate: 

bd  V  et  Itk  J  ~'(bd  V  et)  =>•  —'Itk  (5.18) 

Fig  5.4  presents  a  schematic  view  of  this  example  and  Table  5.4  contains  a  strat¬ 
ified  ranking.  Some  c- consequences  of  the  rules  in  Eqs.  5.16-5.18  are: 

tk  IJ^  cs  ;  cs  ||x  tk  ;  tk  A  bd  |j^  -‘cs  (5.19) 

i4Since  in  this  case  there  is  only  one  cause  for  cs,  we  simplify  the  encoding  and  skip  the  final 
or-gate. 
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K 

worlds  u 

0 

(tk,  — i jT,  ~i bd,  -i et,  cs ),  (-‘tk,  —>I,  —> bd ,  —>et,  ->cs) 

1 

(tk,  I,  bd,  -i et,  -i cs),  (tk,  I,  -‘bd,  et,  -<cs), 

(-i tk ,  I,  -■  bd,  et,  ->cs),  (~<tk,  I,  bd,  ->et,  — ic.s) 

2 

(tk,  I,  bd,  et,  ~'cs),  (-‘tk,  I,  bd,  et,  ~>cs), 

Table  5.4:  A  stratified  ranking  for  the  car  example. 

Given  the  independence  constraints  embedded  in  our  formalism,  any  stratified 
ranking  for  the  rules  in  Eqs.  5.16-5.18  will  comply  with  n(bdAet)  =  K,(bd )  +  n(et) 
making  a  world  where  both  bd  and  et  are  true  more  abnormal  than  one  in  which 
only  one  of  them  holds.  Thus,  in  the  situation  in  which  we  turn  the  key  and  the 
car  engine  does  not  start,  c-entailment  conclude  that  either  the  battery  is  dead 
or  the  gas-tank  is  empty,  but  not  both: 

tk  A  ~'cs  ((bd  A  ->et)  V  (-> bd  A  et)).  (5.20) 

In  Section  5.3.3  we  explore  mechanisms  to  add  degrees  of  strength  to  the  rules 
using  the  formalism  described  in  [53],  so  that  the  degrees  of  support  for  each 
hypothesis  can  be  used  to  manipulate  the  focus  of  the  diagnosis  process.  To 
complete  the  encoding  of  the  causal  relations  in  Example  5.1  we  add  an  and-gate 
representing  the  causal  relation  between  “head  lights  on  all  night”  ( lo )  and  the 
dead  battery  (bd): 


lo  A  ~ i Ii0  =>  bd  ;  True  — >  IiQ  (5.21) 

I[0  =>  -i bd  ;  -'lo  =>  —<bd  (5.22) 

This  proposal  of  model  completion  requires  that  the  set  of  causes  for  a  given 
effect,  be  both  identifiable  and  separable  from  the  set  of  causes  that  prevent  the 
effect,  i.e.  the  inhibitors.  One  way  of  establishing  this  difference  is  by  eliciting  the 
information  directly  from  the  rule  encoder:  For  each  effect  E  we  would  ask  for  a 
list  of  causes  C\, . . .  Cn  and  a  list  of  events  (causes  or  effects)  that  might  prevent 
E  from  occurring.  Another  way  is  to  allow  the  input  of  the  causal  relations  to 
be  specified  in  the  same  language  of  networks.  Then  the  system  would  compile 
this  network  into  a  target  network  containing  rules  in  Eqs.  5.16-5.18  filling  in 
the  assumptions  of  the  noisy-or  model  using  and-gates  and  or-gates  as  building 
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blocks.  This  process  would  examine  the  rules  family  by  family 25  following  the 
stratified  order  imposed  by  the  underlying  graph  of  A.  In  case  of  conflicting 
relations,  i.e.  a  set  of  literal  supporting  an  effect  e  and  another  set  supporting 
->e,  the  system  will  try  to  uncover  the  inhibitor  from  the  causes  by  using  the  rule 
of  Presupposition  in  Theorem  5.5.  This  rule  of  inference  says  that  if  C,  E  and 
Ci  A  Ii  |f^  -<E  then  Ci  |j^  Thus,  in  the  car  example  we  would  have  the  set 
A  =  {tk  —>■  cs,  bd  Atk  =>•  -ics,  et  A  tk  =>  ->cs}  as  input,  and  a  simple  application 
of  the  rule  of  Presupposition  will  mark  both  bd  and  et  as  inhibitors  with  respect 
to  tk.  Note  that  cases  of  ambiguous  families  like  {a  — >  c,  b  — >  ->c}  would  require 
further  information  about  the  relation  between  a  and  b ,  since  first  we  cannot 
distinguish  between  causes  and  inhibitors,  and  second,  an  encoding  of  both  these 
rules  as  and-gates  will  result  in  a  c-inconsistent  network  similar  to  the  example  of 
conflicting  strict  arrows  in  Section  5.3.1.  The  problem  is  that  the  assumption  of 
exception  independence  is  no  longer  valid:  The  cause  for  c  (i.e.,  a)  is  the  inhibitor 
for  the  cause  of  ->c  (i.e.,  b)  and  vice-versa. 


5.3.3  The  most  normal  stratified  ranking 

In  Section  5.3.2  we  saw  that  in  order  to  reap  the  full  benefits  of  Bayesian  Net¬ 
works,  we  needed  to  supplement  the  constraints  of  A  with  additional  informa¬ 
tion  that  further  shapes  the  conditional  rankings  of  each  family  in  the  underlying 
DAG.  Another  approach  of  supplementing  the  missing  information  is  to  establish 
a  preference  relation  among  stratified  rankings  and  rule  out  those  rankings  that 
are  less  preferred  than  others.  Since  a  lower  ranking  is  associated  with  greater 
normality,  it  is  natural  that  out  of  the  set  of  all  admissible  stratified  rankings 
we  prefer  those  that  assign  to  interpretations  the  lowest  possible  ranks,  and  then 
define  the  entailment  relation  with  respect  to  this  set  of  privileged  rankings. 

Such  a  strategy  was  adopted  in  Chapter  4  (without  the  requirement  of  strat¬ 
ification)  and  led  to  system- Z+.  The  incorporation  of  the  most-normal  strategy 
in  the  context  of  stratified  rankings,  will  result  in  a  substantial  increase  of  ex¬ 
pressiveness.  For  example,  in  the  noisy-or  encoding  of  causal  relations,  assuming 
independence  among  the  inhibitors,  the  most-normal  (minimal)  stratified  ranking 
is  given  by  the  following  function: 

nf(u>)  =  number  of  inhibitors  that  are  true  in  to  (5.23) 

Thus,  in  the  car  example  (with  dead  battery  and  empty  tank  as  inhibitors)  in 
Eqs.  5.16-5.18,  the  minimal  ranking  nflu)  will  be  0,  1  or  2  depending  on  whether 

25  A  family  is  the  set  of  propositions  composed  by  the  parent  set  of  an  effect  and  the  effect 
itself  (Section  5.2). 
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to  f=  ~^bd  A  ->ei,  u>  (=  ( bd  A  ~>et)  V  {->bd  A  et)  or  lo  (=  bd  A  et  respectively  (this 
minimal  ranking  is  depicted  in  Table  5.4). 

The  incorporation  of  variable-strength  rules  in  this  context  is  especially  useful: 

From  knowing  that  we  turned  the  key  but  the  car  did  not  start,  it  follows  that  ei¬ 
ther  the  battery  is  dead  or  that  the  tank  is  empty  ( tk  A  ->cs  \\^  (( bd  A  ->et)  V  (~>bd  A  et))). 
However,  if  we  knew  that  an  empty  tank  is  more  likely  than  a  dead  battery,  we 
could  encode  this  information  as  True  -4  -> et  and  True  -4  -i bd  with  8i  <  82.  The 
minimal  ranking  (w)  in  this  case  will  be  0,  8i  +  1,  82  +  1,  or  Si  -f-62  + 1  depending 
on  whether  to  \=  ~^bd  A  — u>  (=  (-> bd  A  et),  u  (=  (bd  A  ~<et)  or  u>  f=  bd  A  et  respec¬ 
tively.  Now  given  the  context  that  we  turn  the  key  and  the  car  does  not  start,  our 
primary  suspect  would  be  the  lack  of  gasoline  tk  A  ~<cs  |j^*  et,  where  |{^*  denotes 
the  consequence  relation  of  the  most-normal  stratified  ranking  (which  is  unique 
for  this  example). 

Note,  as  shown  in  Table  5.5,  that  the  most-normal  stratified  I'anking  may  not 
be  unique  for  the  network  A  =  {a  — >  c,  b  -4  ->c}.  Therefore,  we  need  to  define 
entailment  in  minimal  rankings,  denoted  by  |f^*,  with  respect  to  the  consequence 
relations  of  all  most-normal  stratified  rankings. 


K 

Rank  1 

0 

(->«,  b ,  — ic ) ,  (-ifi,  ->b,  c),  (-i c,  -ib,  ~i c) 

1 

(~>a,  b,  c),  ( a,b,c ),  ( a,->b,c ) 

2 

(a,  b,  — >c),  (a,  -ib,  -ic) 

K 

Rank  2 

0 

(a,  -ib,  c),  (-i(i,  ->b,  c),  (-i(i,  ~ib,  -> c) 

1 

(-hi,  b,  -ic),  (a,  b ,  -ic),  (a,  ->b,  -ic) 

2 

(a,b,c),  (-ici,  b,  c) 

Table  5.5:  Two  minimal  rankings  for  {a  — »■  c,  b  — »  ->c} 


It  is  not  clear  at  this  point  whether  this  loss  of  uniqueness  will  result  in 
substantial  increase  in  computational  complexity.  Although  we  may  lose  the 
semi-tractability  of  system- Z+,  we  can  still  exploit  the  topological  properties 
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of  the  characteristic  DAG  (T^a))  to  render  practical  reasoning  feasible.  It  is 
well  known  that  the  local  propagation  techniques  of  Bayesian  Networks  can  be 
extended  to  sparse  networks  by  embedding  the  network  in  a  hypertree  (or  acyclic 
database)  [115].  Thus,  it  is  quite  feasible  that  similar  techniques  could  be  applied 
to  the  computation  of  consequences  in  systems  governed  by  the  “most-normal” 
completion  of  stratified  rankings. 


5.4  Belief  Update 

Section  4.6  demonstrated  how  the  semantics  of  model  ranking,  together  with 
the  syntactic  machinery  developed  for  system-Z+,  can  be  applied  to  manage  the 
tasks  of  belief  revision,  in  conformity  with  the  AGM  postulates.  The  introduction 
of  stratified  ranking  adds  the  capability  for  implementing  a  new  type  of  belief 
changes,  named  update  by  Katsuno  and  Mendelzon  (KM)  [65].  In  both  tasks  (be¬ 
lief  revision  and  belief  update)  we  seek  to  incorporate  a  new  piece  of  information 
(f>  into  an  existing  set  of  beliefs  ijj.  Yet,  in  belief  revision  4>  is  assumed  to  be  a 
piece  of  evidence  while  in  update  </>  is  treated  as  a  change  occurring  by  external 
intervention.  Katsuno  and  Mendelzon  [65]  have  shown  that  the  AGM  postulates 
are  inadequate  for  describing  changes  caused  by  updates,  for  which  they  have 
proposed  new  sets  of  postulates.  The  basic  difference  between  revision  and  up¬ 
date  is  that  the  latter  permits  changes  in  each  possible  world  independently,  as 
was  proposed  by  Winslett  [127]. 26 

Belief  update  can  be  embodied  in  a  stratified  ranking  system  using  the  fol¬ 
lowing  device:  For  each  instruction  to  “update  the  knowledge  base  by  (jP  we  add 
a  set  of  rules  that  simulates  the  action  “do(^),  leaving  everything  else  constant 
(whenever  possible)”,  and  then  condition  k  on  the  truth  of  do(cf>).  The  following 
set  of  causal  rules  embody  the  intent  of  this  action,  where  (j)  and  (f>'  stand  for  “(f) 
holds  at  t”  and  “<f>  holds  at  t'  >  t” ,  respectively:2' 

<f>  -»  (j)'  (5.24) 

-'(j)  ~'4>'  (5.25) 

do((f> )  =>-  (j)1.  (5.26) 

The  following  example  (adapted  from  Winslett  [127])  demonstrates  how  this  de¬ 
vice  differentiates  between  update  and  revision. 

26In  the  language  of  Bayesian  networks,  the  difference  between  updates  and  revisions  parallels 
the  distinction  between  causal  and  evidential  information  [96]. 

2 'The  two  persistence  rules,  Eqs.  5.24  and  5.25,  are  presumed  to  apply  between  any  two 
atomic  propositions  at  two  successive  times. 
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Example  5.3  (XOR-gate)  A  X 'OR  Boolean  gate  c  =  XOR(a,b )  is  examined 
at  two  different  times.  At  time  t ,  we  observe  the  output  c  =  true  and  conclude 
that  one  of  the  inputs  a  or  b  must  be  true,  but  not  both.  At  a  later  time  t1  we 
learn  that  b '  is  true  (primed  letters  denote  propositions  at  time  t1),  and  we  wish 
to  change  our  beliefs  (in  a  and  a')  accordingly.  Naturally,  this  change  should 
depend  on  how  the  truth  of  b'  is  learned.  If  we  learn  b'  by  measuring  the  voltage 
on  the  b  terminal  of  the  gate,  then  we  have  a  belief  revision  process  on  our  hands, 
and  we  expect  a'  to  be  false.  On  the  other  hand,  if  we  learn  that  b'  is  true  as  a 
result  of  physically  connecting  the  b  terminal  to  a  voltage  source,  we  no  longer 
expect  a'  to  be  false,  since  we  have  no  reason  to  believe  that  the  output  c  has 
retained  its  truth  value  in  the  process. 

In  the  stratified  ranking  formulation,  the  knowledge  base  corresponding  to  this 
example  will  consist  of  three  components: 

1.  The  functional  description  of  the  XOR  gate  at  times  t  and  t1, 

a  A  b  =>  ->c  ;  ->a  A  ->b  =>  ->c 
a  A  ~>b  =$■  c  ;  ->a  A  6  c, 

and  an  equivalent  set  of  rules  for  a',  b1,  d. 

2.  The  persistence  rules:  For  every  x  in  {a,  6,  c}, 

x  — >  x'  ;  -ix  — >  -ix  . 

3.  The  action  do(b),  which  represents  the  external  influence  on  b': 

do(b )  =*>  b'.  (5.30) 

The  underlying  graph  for  the  network  A  corresponding  to  this  knowledge  base 
is  depicted  in  Figure  5.5. 

Initially,  after  observing  c,  our  evidence  consists  only  of  c.  The  minimal 
stratified  ranking  kc  for  a  A  consisting  of  rules  in  Eqs.  5.27-5.30  is  depicted  in 
Table  5.6.  To  represent  belief  revision,  we  add  b'  to  our  evidence  set  and  query 
whether  c  A  b'  |f^*  ->a/.28  In  contrast,  to  represent  belief  update,  we  add  clo(b )  to 
our  evidence  set  and  query  whether  (c  A  b'  A  do(b ))  |j^*  ~<a'. 

It  is  easy  to  show  that  the  first  query  is  answered  in  the  affirmative,  while 
the  second  in  the  negative.  The  left-hand  side  of  Table  5.7  shows  the  ranking 

^Recall  that  ||~*  denotes  the  consequence  relation  of  the  minimal  stratified  ranking  for  A 
(see  Sec.  5.3.3),  which  is  unique  for  this  example. 


(5.27) 

(5.28) 


(5.29) 
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figure  5.5:  Graph  depicting  the  causal  dependencies  in  Example  5.3 


Kc 

—ido(b) 

do(b ) 

0 

(~>a,  b,  — >a',  b'),  (a,  ->b,  td,  -> b ') 

i 

ha,  b,  td,  b’),  (a,  -«6,  -V,  -.&')>  («>  “A  &') 

(-i a,  b,  -i td,  6'),  (a,  ->&,  a',  6') 

2 

(->a,  6,  a',  -i6/),  (a,  -\b,  ->a\  b'), 

(->a,  6,  a',  &')>  (ai  A  A,  &0 

00 

models  for 

■ 

models  for  -i c 

Table  5.6:  Minimal  stratified  ranking  for  Example  4  after  c  is  observed 


K/C 

Revision 

Update  nc(uj\do(b)) 

0 

(“>«,  b,  -it;/,  U) 

(-ia,  b,  — '<2.^,  67),  (a,  — >6 ,  td,  Id) 

i 

(-i a,  6,  a',  &'),  (a,  — 16,  a',  6') 

(-ia,  b,  td,  br),  (a,  -> b,  -> td,  b ') 

2 

(a,  — 16,  -ltd,  6'), 

CO 

models  for  ~dd 

models  for  -> do(b) 

Table  5.7:  Rankings  after  observing  b,  and  after  “doing”  b 
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resulting  from  revising  the  ranking  in  Table  5.6  by  b'  (first  query),  while  the 
right-hand  side  shows  the  ranking  after  updating  by  do(b)  (second  query).  Note 
that  in  the  revised  ranking  the  only  world  in  the  zero  rank  is  a  model  for  -> a', 
while  the  updated  ranking  shows  an  additional  world  which  is  a  model  for  a' 
(the  state  of  the  output  c  in  this  world  changed  as  a  consequence  of  the  action). 
The  action  do(b)  establishes  the  truth  of  b'  but  has  no  effect  on  what  we  believe 
about  the  second  input  a! .  Since  neither  a  nor  ->a  were  believed  at  t ,  they  remain 
unbelieved  at  t' . 

5.4.1  The  dynamics  of  belief  update 

The  example  above  demonstrates  that,  given  a  ranking  k  and  a  network  A,  it  is 
possible  to  predict  how  a  system  would  respond  to  external  interventions.  For 
example,  if  we  wish  to  inquire  whether  event  e  will  hold  true  after  we  force  some 
variable  A  to  become  true,  we  simply  add  to  A  the  rule  do(a)  =4>  a,29  recompute 
the  resulting  stratified  ranking  s'  on  the  augmented  set  of  variables  (including 
do(a)),  and  then  compute  s!  (t\do{a)) .  There  is  a  simple  relation  between  /c(e| a) 
and  K'(e\do(a)),  which  results  in  a  direct  transformation  between  two  ranking 
functions,  k(u>)  and  Kdo(a)(w),  the  latter  being  an  abbreviation  of  s'{w\ do(a)). 
From  Eq.  5.4  we  have  that: 

i—n 

K(Xn  A  ...  A  A  A  ...  A  X\ )  =  J2  n(Xn\P arXn)  (5.31) 

i- 1 

K(xn  A  ...  A  A  A  ...  A  A-i)  =  Y  K,{Xn\ParXn )  +  s(A\ParA)  (5.32) 

where  A  is  the  jth  literal  taking  values  from  {a,  ->a}.  Similarly,  the  stratification 
of  s'  relative  to  A  U  { do(a )  =t>  a}  dictates 

K\Xn  A  ...  A  A  A  ...  A  Ah  A  DO(a))  = 

Y  K'(Xn\ParXn)  +  K'(A\Par'A)  +  K'(DO(a))  (5.33) 

29  We  use  lowercase  to  denote  the  instantiation  of  variable  A  to  a  truth  value. 
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Where  DO  (a)  is  a  variable  taking  values  from  (do(a), -ido(a)},  Par'A  =  Par  a  U 
(Do(a)},  and 


K'(A\Par'A) 


0 


if  A  =  a  and  DO(a )  =  do(a ), 


<  K,{A\ParA)  if  DO  {a)  =  do(a), 

oo  if  A  =  -i  a  and  DO  (a)  =  do(a). 


(5.34) 


Eq.  5.33  reflects  the  fact  that  the  action  variable  DO  (a)  is  a  root  node  in  r^,A>? 
since  it  is  under  the  sole  control  of  the  rule  author,  while  Eq.  5.34  reflects  the 
constraint  do(a )  =4>  a.  Since  the  new  rule  do(a )  =$■  a  only  affects  the  family 
of  A,  we  have  that  the  summation  term  in  Eq.  5.33  is  equal  to  the  summation 
term  in  Eq.  5.32.  Conditioning  Eq.  5.33  on  do(a)  and  making  the  appropriate 
substitutions  yields 


/c  (A  n  A  ...  A  A  A  ...  A  A.  j  | dololjj  —  n(Xn  A  ...  A  A  A  ...  A  A  \ )  — 

—  n[A\ParA)  +  K'{A\ParA  A  do(a ))  (5.35) 

Where,  according  to  Eq.  5.34,  P  (A\P  ar  Af\do{a))  =  0  when  A  —  a,  and  Kr(A\ParA/\ 
do(a))  =  oo  when  A  =  ->a.  Thus,  making  again  the  appropriate  substitutions  we 
get 


1k(w)  —  k(u\P ar a(w))  if  to  (=  a. 

(5.36) 

oo  if  ui  |=  ->a. 

In  other  words,  the  k  of  each  world  oj  satisfying  a  is  reduced  by  an  amount  equal 
to  the  degree  of  surprise  of  finding  A  =  true ,  given  the  realization  of  Par  a  in  u> 
(denoted  by  ParA{u>).  The  k  of  each  world  falsifying  a  is  of  course  oo.30 

Such  independent  movement  from  world  to  world  is  shown  in  Example  5.3, 
where  k( to)  is  depicted  on  the  left-hand  side  of  Table  5.6  and  Kd0(y>)  is  depicted 
on  the  right  hand  side  of  Table  5.7.  If  A  has  no  parents  (direct  causes),  then 
Kdo(a)  is  obtained  by  shifting  the  k  of  each  u>  |=  a  by  a  constant  amount  k (a), 
as  in  ordinary  conditioning,  and  Kd0(a)(w)  would  be  equal  to  «(a>|a),  as  expected. 
However,  when  the  manipulated  variable  has  direct  causes  ParA ,  the  amount  of 

30The  reader  might  recognize  Eq.  5.36  in  its  probabilistic  form  where,  given  a  probability 
function  P(u>)  and  a  causal  network  E,  the  probability  P'(ui)  obtained  by  manipulating  variable 
A  to  take  on  the  value  a  is  given  by:  P'{u>)  =  P [u) / P {a\P ar A{w))  (for  w  f=  a).  This  can  be 
easily  shown  from  the  functional  definition  of  causal  relationships  as  used,  for  example,  in  Pearl 
and  Verma  [103]. 
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shift  would  vary  from  world  to  world,  depending  on  how  surprising  it  would  be 
(in  that  world)  to  find  a  happening  naturally  (without  external  intervention).  For 
instance,  if  A  is  governed  by  persistence  rules,  a(t  —  1)  — >  a(t),  -ia(f  —  1)  — ► 
then  worlds  in  which  apt  —  1)  hold  will  shift  less  than  those  in  which  apt  —  1)  is 
false,  because  apt)  is  expected  to  hold  in  the  former  and  not  in  the  latter.  Note 
that  the  amount  of  shift  subtracted  from  k{ui)  is  equal  precisely  to  the  fraction  of 
surprise  K(a\P  ar  Apjj))  that  A  =  true  contributes  to  k(u>)  and  that  now  becomes 
explained  away  (hence  excusable)  by  the  action  do(a).  The  generalization  of 
Eq.  5.36  to  the  case  where  a  conjunction  of  literals  p  =  a\  A  ->02  •  •  •  are  forced  to 
become  true  or  false  is  straightforward: 

«do(^)(w)  =  k(w)  -  [n{a\\ParAl  (u))  +  n^a^P ar a2(u))  +  . . .]  (5.37) 

if  lo  j=  p,  and  /c<f0(^)(a>)  =  oo  otherwise. 

Note  that  any  stratified  ranking  k  has  at  least  one  variable  A  possessing  a 
remarkable  invariant  properties: 


Kdo(a)  (k-’) 


k(u>)  if  U!  (=  a, 
oo  otherwise. 


(5.38) 


Intuitively,  every  variable  satisfying  Eq.  5.38  corresponds  to  a  sink  in  ly^.A),  or  a 
“last”  variable  in  the  “temporal”  ordering  O.  Indeed,  for  any  such  sink,  Eq.  5.38 
conveys  the  intuition  that  by  manipulating  the  last  variable  in  the  temporal  order, 
we  do  not  expect  the  past  to  change.  It  is  comforting  to  see  that  the  ramifications 
of  the  Markov  shielding  principle  coincide  with  an  alternate  reading  of  causation 
as  a  specification  of  behavior  under  external  interventions. 


5.4.2  Relation  to  KM  postulates 

Katsuno  and  Mendelzon  [65]  have  formulated  belief  update  as  a  transformation 
between  two  formulas,  -0,  representing  our  current  set  of  beliefs,  and  p,  the  new 
information  we  wish  to  incorporate  into  that  set  of  beliefs.  The  update  process 
is  assumed  to  be  an  operator  o  that  takes  the  formula  ip  and  transforms  it  into 
a  new  formula  ip  op,  that  syntactically  represents  our  updated  set  of  beliefs. 
KM  have  introduced  a  set  of  postulates  which  characterize  all  update  operators 
that  can  be  defined  by  the  possible  world  approach  of  Winslett  [127],  hence,  they 
are  considered  universal  conditions  for  any  model  describing  belief  change  due  to 
external  actions.  One  such  postulate,  for  example, 

(U2)  If  ip  implies  p,  then  ip  o  p  is  equivalent  to  ip, 
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says  that,  if  the  new  sentence  (p  is  derivable  from  belief  set  ip,  that  updating  by  <p 
does  not  alter  the  belief  set. 

In  our  ranked-based  model  of  beliefs  change,  the  current  stock  of  beliefs  is 
represented  by  those  worlds  i o  for  which  k  is  zero.  Hence,  ip  is  defined  by  the 
union  of  all  u o  such  that  k(w)  =  0.  If  the  new  sentence  (<p)  is  a  conjunction  of 
literals,  then  the  updated  ranking  is  given  by  Eq  5.37  and  the  new  set  of  beliefs, 
ip  o  (f>.  is  represented  semantically  by  the  union  of  all  worlds  w  for  which  Kdo(<j>){w ) 
is  zero.31 

That  updates  resulting  from  Eq.  5.36  comply  with  the  I\M  postulates  can 
be  seen  by  the  following  consideration.  KM  have  shown  that  their  axioms  are 
equivalent  to  the  existence  of  a  function  mapping  each  possible  interpretation 
world  w  to  a  partial  pre-order  <w,  such  that  for  any  interpretation  u/,  if  to  ^  lo' 
then  lo  <w  u/ .  Then  the  set  of  models  for  the  update  of  a  formula  ip  (representing 
our  current  beliefs)  by  a  formula  cp,  written  ip  o  <p,  is  found  by  taking  the  union 
of  the  minimal  models  for  (p,  with  respect  to  each  one  of  the  pre-orders  defined 
by  the  models  for  ip: 

Mods[ipo<p)  —  (J  mm(Mods(<p),  <w).  (5.39) 

ojBM  ods(pip) 

In  other  words,  Eq.  5.39  asserts  that  the  models  of  ip  o  <p  can  be  obtained  by 
replacing  each  ip- world  lo  with  a  set  of  ^-worlds  lo*  that  are  nearest  to  lo.  We 
shall  call  each  such  w*  an  image  of  t a,  a  word  coined  by  Lewis  [77]  to  denote  a 
counterfactual  alternative  to  Bayes  conditioning.  If  lo  is  consistent  with  <p  then 
its  image  to*  is  equal  to  lo  itself,  as  is  required  by  <w.  However,  when  lo  is 
inconsistent  with  <p ,  its  image  is  a  closest  (according  to  <w)  world  satisfying  <p. 

Thus,  to  show  compliance  with  the  KM  postulates  we  need  to  define  a  preorder 
<0,  and  show  that  for  every  world  lo  [=  ~^(p  that  is  currently  assigned  k(lo)  =  0, 
Eq.  5.37  takes  each  image  to*  of  c o  and  moves  it  toward  Kdou){L0*)  =  0.  We  shall 
construct  such  a  preorder  and  show  that,  moreover,  in  an  image  world  lo*,  every 

31  Updates  involving  disjunctions  require  special  treatment.  If  they  are  to  be  interpreted  as  a 
license  to  effect  any  change  satisfying  the  disjunction,  then  the  final  state  of  belief  is  the  union, 
taken  over  all  disjuncts,  of  worlds  that  drift  to  k  =  0.  In  this  interpretation,  the  instruction 
“make  sure  the  box  is  painted  either  blue  or  white”  will  leave  the  box  color  unknown,  even 
knowing  that  the  box  was  white  initially  (contrary  to  the  postulate  ( U 2 )  of  KM).  However,  if 
the  intention  is  to  effect  no  change  as  long  as  the  disjunctive  condition  is  satisfied,  then  the 
knowledge  base  should  be  augmented  with  an  observation-dependent  strategy  “ do(<f> )  when  <fi 
is  not  satisfied”,  instead  of  using  the  pure  action  do(<t>).  Conditioning  on  such  a  strategy  again 
yields  a  belief  set  consistent  with  the  KM  postulates.  The  first  interpretation  is  useful  for 
discrediting  earlier  observations,  for  example,  “I  am  not  sure  the  employee’s  salary  is  50 A';  it 
could  be  anywhere  between  40 A'  and  60A'”. 
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term  K{xi\ParXi{w*))  >  0  represents  a  violation  of  expectation  that  would  be 
totally  excusable  were  it  caused  by  an  external  intervention  such  as  <f>.  Intuitively, 
the  image  world  corresponds  to  a  scenario  in  which  all  the  unexpected  events  are 
attributed  to  the  intervention  of  <f>  but  otherwise  the  world  follows  its  natural, 
unperturbed  course  as  dictated  by  the  prediction  of  the  causal  theory. 

It  is  not  hard  to  see  that  the  image  uj*  as  described  above  is  indeed  a  minimal 
element  in  the  order  <w,  defined  as  follows: 

Definition  5.13  (World  orderings)  Let  O  =  Xi,  x2, . . . ,  xn  be  any  order  of 
the  variables  that  is  consistent  with  the  DAG  L^.a)-  Given  three  worlds 
and  u>2,  we  say  that  u>\  <w  uo2  iff  the  following  conditions  hold: 

1.  u>  disagrees  with  uj2  on  a  literal  that  is  earlier  (in  O )  than  any  literal  on 
which  u)  disagrees  with  uj\ . 

2.  If  a  tie  occurs,  then  wi  <w  uj2  if  k(u i)  <  k{w2). 


□ 


Theorem  5.14  Let  ip  be  a  wff  representing  a  set  of  beliefs.  Let  k  be  a  ranking 
such  that  to  £  M ods(ip)  iff  k(uj)  =  0.  Let  f  represent  a  conjunction  of  literals, 
and  let  Kd0(<t>)  be  the  ranking  that  results  from  updating  n  by  f  as  shown  in  Eq.  5.36 
such  that  uj*  €  Modspip  of)  iff  Kj0(4>)(w*)  =  0.  Then 

M ods(ip  o  <f>)  =  (J  mm(M ods(f) ,  <w).  (5.40) 

c o£Mods(ijj) 


5.4.3  Related  work 

The  connection  between  belief  update  and  theories  of  action  was  noted  by  Winslett 
in  [127]  and  has  been  elaborated  more  recently  by  del  Val  and  Shoham  [22]  using 
the  situation  calculus.  In  fact,  del  Val  and  Shoham  [22]  showed  that  the  KM- 
postulates  can  be  derived  from  their  formulation  of  actions  in  the  situation  calcu¬ 
lus,  as  they  are  derived  from  the  theory  presented  in  this  chapter.  The  interesting 
power  of  these  postulates  is  that  they  cover  a  wide  variety  of  such  formulations, 
from  a  simple  theory  such  as  the  one  introduced  here  to  the  intricate  machinery 
of  the  situation  calculus.  Due  to  their  broad  generality,  the  I\M  postulates  should 
not  be  taken  as  a  complete  characterization  of  actions-based  updates,  but  merely 
as  a  useful  norm  of  coherence  on  the  resulting  belief  change.  The  analysis  in  this 
chapter  offers  the  KM  postulates  an  intuitive,  model-theoretic  support  that  is 


111 


well  grounded  in  probability  theory,  and  is  accompanied  with  a  concrete  charac¬ 
terization  of  causation  and  action.  It  also  offers  a  simple  unification  of  revision 
and  update,  since  both  are  embodied  in  a  conditioning  operator,  the  former  by 
conditioning  on  observations  and  the  latter  by  conditioning  on  actions. 

Grahne  et.  al.  [56]  showed  that  revision  could  be  expressed  in  terms  of  an 
update  operator  in  a  language  of  introspection  (intuitively,  observing  a  piece  of 
evidence  has  the  same  effect  as  causing  the  observer  to  augment  her  beliefs  by 
that  very  evidence).  The  analysis  in  this  chapter  shows  that  the  converse  is  also 
true:  belief  updates  can  be  expressed  in  terms  of  a  conditioning  operator,  which 
is  normally  reserved  for  belief  revision.  The  intuition  is  that  acting  to  produce 
a  certain  effect  yields  the  same  beliefs  as  observing  that  action  performed.  This 
translation  is  facilitated  by  the  special  status  that  the  added  action  =>  effect 
rules  enjoy  in  stratified  ranking,  where  actions  are  always  represented  as  root 
nodes,  independent  of  all  other  events  except  their  consequences.  This  ensures 
that  the  immediate  effects  of  those  actions  are  explained  away  and  do  not  reflect 
back  on  other  events  in  the  past.  It  is  this  stratification  that  produces  the  desired 
distinction  between  observing  an  action  produce  an  effect  and  observing  the  effect 
without  the  action.32 

5.5  Discussion 

Extensions  to  the  formalism  proposed  in  this  chapter  should  include  efficient  de¬ 
cision  procedures  for  c-consistency  and  c-entailment,  and  a  complete  proof  theory 
for  c-entailment.  Also  of  interest,  are  notions  of  entailment  based  on  strategies 
for  completing  the  information  provided  by  the  rules  in  a  network  A.  In  Sec¬ 
tion  5.3.3  we  presented  one  such  strategy  based  on  the  most  normal  completion 
proposed  in  Chapter  4.  However,  contrary  to  the  case  of  system- Z+,  this  strategy 
will  not  always  yield  a  unique  ranking  for  the  stratified  case.  Further  investiga¬ 
tions  are  needed  to  uncover  classes  of  networks  where  the  resulting  ranking  is 
unique  and  the  decision  procedures  tractable.  The  bridge  that  the  principle  of 
Markov  shielding  establishes  between  probabilistic  and  nonmonotonic  formalisms 
invites  insights  on  these  issues,  including  efficient  query  answering  procedures  and 

32Note  that  update  cannot  be  expressed  in  terms  of  the  AGM  operators  of  revision  and 
contraction,  because  it  is  impossible  to  simulate  with  these  operators  the  acceptance  of  a  new 
conditional  do(<fi)  <f>,  so  that  the  acceptance  of  do{<j>)  is  treated  differently  than  the  acceptance 
of  <j>.  Similarly,  update  cannot  be  formulated  as  a  transformation  on  rankings  such  as  Spohn’s 
conditioning  because  the  identity  of  the  image  world  lj*  cannot  be  described  in  terms  of  the 
initial  ranking  alone;  it  requires  the  causal  theory  A.  Two  different  theories,  Ai  and  Aj  may 
give  rise  to  the  same  ranking  k,  and  still  require  two  very  different  updates. 
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methods  of  completion  (e.g.,  the  noisy-or  canonical  model  in  Sec.  5.3.2),  from  the 
literature  on  Bayes  networks. 

As  pointed  out  in  Section  5.3,  the  notion  of  c-entailment  <p  ||~  ip  should  not 
be  understood  as  establishing  <p  as  a  cause  of  ip.  The  reason  is  best  illustrated 
through  the  following  example.  Consider  the  network  A  =  {True  — >  b};  it  follows 
from  this  network  that  a  6  where  a  is  an  arbitrary  proposition  in  the  language. 
By  the  condition  of  stratification,  each  stratified  ranking  n  must  comply  with 
k(A  A  B)  =  k(A)  +  k(B).33  Therefore,  in  every  stratified  ranking  for  A  it  must 
be  true  that 


n(a  A  b)  <  «(aA“'6),  or  (5-41) 

n(a)  +  n(b)  <  k[o)  +  K(->b)  (5.42) 

since  the  rule  in  A  establishes  that  k(b)  <  «(-'&).  Thus,  the  reason  that  b  is  ex¬ 
pected  given  a  is  not  because  a  is  a  cause  for  6,  in  fact  a  is  actually  independent  of 
b.  b  is  expected  simply  because  it  is  true  by  default  according  to  A.  Note  that  this 
problem  disappears  if  we  make  the  additional  requirement  that  «(-i&|a)  >  K(->b). 
However,  this  definition  of  causation,  by  being  based  on  Bayesian  conditioning, 
would  still  be  subject  to  the  classical  difficulties  with  spurious  correlations  (or 
hidden  causes).  The  definition  proposed  here  is  based  on  the  use  of  rules  like 
do(c. )  =>  c  to  simulate  an  external  manipulation  of  the  causes.  The  decision  on 
whether  c  is  a  cause  of  e  can  be  made  based  on  manipulating  and  observing  the 
behavior  of  e.  Thus,  for  example,  c  can  be  identified  as  a  cause  for  e  in  the 
context  of  a  knowledge  base  A,  if  do(c )  |j^  e  in  the  context  of  A  U  {cio(c)  =>  c}, 
but  it  is  not  the  case  that  do(-'c)  |j^  e  in  the  context  of  A  U  {do(~>c)  =>-  -ic}.  This 
notion  is  in  line  with  the  counterfactual  reading  of  causation  in  Lewis  [76]  where 
asserting  that  c  is  a  cause  of  e  implies  that  e  would  not  have  occurred  if  it  were 
not  for  c.  It  is  also  in  line  with  the  control-based  reading  of  causation  which  un¬ 
derlies  most  statistical  tests  for  causal  influences  as  well  as  the  method  proposed 
by  Pearl  and  Vermain  [103]  for  discovering  causality  in  nonexperimental  studies. 
This  interpretation  reads: 

“c  is  a  cause  for  e  if  an  external  agent  interfering  only  with  c  can 
affect  y.” 

In  nonexperimental  studies  the  external  agent  is  simulated  by  a  ” virtual-control” 
variable,  while  in  our  formulation  it  is  enacted  by  the  do  operator;  both  must 
comply  with  the  Markov  shielding  constraint. 

33Capit.a.l  letters  denote  literal  variables. 
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Finally,  we  point  out  that  the  probabilistic  roots  of  the  semantics  proposed 
provides  bi-directional  inferences  for  causal  and  evidential  information,  the  po¬ 
tential  of  refining  pre-encoded  knowledge  by  learning  from  experience,  and  the 
usual  guarantees  of  clarity,  coherence  and  plausibility  that  accompany  theories 
grounded  in  empirical  reality.  Some  of  the  other  novel  contributions  of  this  chap¬ 
ter  are:  A  consistency  norm  for  knowledge  bases  representing  causal  relationships, 
uniform  and  practical  formulations  for  belief  revision,  belief  updating,  and  general 
reasoning  about  action  and  change. 
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CHAPTER  6 


Concluding  Remarks 


6.1  Summary 

This  dissertation  is  an  account  of  a  semantical  and  computational  approach  to 
reasoning  with  incomplete  and  defeasible  information  encoded  as  default  rules. 
These  rules  are  regarded  as  if-then  conditional  sentences  allowing  exceptions  that 
have  different  degrees  of  abnormality.  Semantically,  the  rules  are  interpreted  us¬ 
ing  infinitesimal  probabilities,  which  can  be  viewed  as  qualitative  abstractions  of 
an  agent’s  experience.  An  equivalent  semantics  is  provided  that  interprets  the 
rules  using  ranks  on  models,  where  higher  ranked  models  stand  for  more  surpris¬ 
ing  (or  less  likely)  situations.  Computationally,  these  semantics  admit  effective 
procedures  for  testing  the  consistency  of  knowledge  bases  containing  default  rules 
and  for  computing  whether  (and  to  what  degree)  a  given  query  is  confirmed  or 
denied.  The  result  is  a  model-theoretic  account  of  plausible  beliefs  that,  as  in 
classical  logic,  are  qualitative  and  deductively  closed  and,  as  in  probability,  are 
subject  to  retraction  and  to  varying  degrees  of  firmness. 

The  probabilistic  semantics  enable  the  introduction  of  principled  ways  for 
solving  some  of  the  problems  with  irrelevance  that  plagued  previous  conditional 
based  approaches.  This  is  accomplished  by  restricting  the  set  of  rankings  that 
are  considered  admissible  with  a  given  knowledge  base.  At  the  heart  of  this 
formulation  is  the  concept  of  default  priorities ,  namely,  a  natural  ordering  of 
the  conditional  sentences  that  is  derived  automatically  from  the  knowledge  base 
and  is  used  to  answer  queries  without  computing  explicit  rankings  of  worlds  or 
formulas.  As  a  result,  some  query-answering  procedures  (those  for  system-Z+;  see 
Chap.  4)  require  only  a  polynomial  number  of  propositional  satisfiability  tests  and 
hence  tractable  for  Horn  expressions.  This  formulation  not  only  offers  a  natural 
embodiment  of  the  principles  of  belief  revision  as  formulated  by  AGM  [3],  but  also 
allows  revisions  based  on  imprecise  observations.  In  addition,  it  enables  features 
such  as  absorption  of  new  conditional  sentences  and  verification  of  counterfactual 
sentences  and  nested  conditionals. 

The  lack  of  a  mechanism  for  distinguishing  causal  relationships  from  other 
kinds  of  associations  has  been  a  serious  deficiency  in  most  nonmonotonic  sys- 
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terns  [96],  This  problem  is  solved  by  augmenting  the  basic  framework  of  ranking 
systems  with  a  simple  mechanism,  called  stratification ,  for  the  representation  of 
causal  relationships,  actions,  and  changes.  A  new  norm  of  consistency  for  knowl¬ 
edge  bases  containing  causal  rules  was  introduced,  and  applications  to  tasks  of 
prediction  and  explanation  were  shown.  The  addition  of  stratification  provides 
the  necessary  machinery  for  embodying  belief  updates  and  belief  revision  within 
the  same  framework. 

6.2  Future  Work 

The  next  three  sections  sketch  new  directions  for  research  into  extending  the 
semantical  and  computational  framework  presented  in  this  dissertation. 

6.2.1  Semantical  Extensions 

Section  4.6  briefly  sketched  how  to  interpret  iterated  (and  embedded)  conditionals 
using  the  ranking  based  semantics  proposed  in  this  dissertation.  More  work  is 
required  in  order  to  characterize,  in  a  manner  similar  to  the  postulates  in  [3, 
65]  for  belief  revision  and  update,  the  process  of  revising  a  knowledge  base  A 
with  a  conditional  rule  <p  — *  ip.  Of  special  interest  are  the  cases  where  A  U 
{sp  — >  ip}  is  inconsistent.  The  final  goal  is  the  development  of  a  meta-logic  in 
which  the  connective  — >  can  be  treated  as  a  another  connective  in  the  underlying 
language.  First  steps  can  be  found  in  [51],  where  system-Z  is  augmented  to 
accept  expressions  of  the  form  ->(<p  — >  ip)]  “it  is  not  the  case  that  typically  if 
(f  then  ip” .  Semantically,  ->(ip  — >  ip)  is  interpreted  as  establishing  that  in  the 
context  of  <p,  the  occurrence  of  ip  is  as  surprising  or  even  more  unlikely  than  the 
occurrence  of  ->ip.  In  terms  of  rankings  ->(<p  — >  ip)  translates  into: 

k(<P  A  ip)  >  k(<p  A  ->ip)  (6.1) 

Procedures  for  testing  consistency  and  answering  queries  requiring  a  polynomial 
number  of  satisfiability  tests  are  also  presented  in  [51]. 

Finally,  the  formulation  in  this  dissertation  is  strictly  propositional.  Of  pri¬ 
mary  interest  are  extensions  to  the  first  order  case  along  the  lines  presented 
in  [73]. 
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6.2.2  Qualitative  and  Quantitative  Information 

The  connection  established  in  this  dissertation  between  probability  theory  and 
qualitative  forms  of  common  sense  inference  provides  a  solid  basis  for  combining 
qualitative  information  in  the  form  of  linguistic  quantifiers,  such  as  “likely”,  “very 
likely”,  “extremely  likely”,  etc,  with  numerical  probabilistic  and  statistical  knowl¬ 
edge.  The  advantage  of  this  proposal  is  that  both  qualitative  and  quantitative 
information  can  be  processed  coherently  under  a  uniform  semantical  interpreta¬ 
tion,  in  full  conformity  with  the  norms  of  probability  calculus.  Efforts  should 
concentrate  on  developing  an  architecture  where  computation  is  performed  in  a 
parallel  and  distributed  fashion  and  where  precision  is  a  function  of  the  required 
urgency  of  the  response  (anytime  response).  The  distributed  algorithms  devel¬ 
oped  by  Pearl  for  Bayesian  Networks  [97]  and  the  work  by  Hunter  [61]  on  parallel 
belief  revision  provide  a  good  starting  point,  applicable  to  cases  where  a  unique 
stratified  ranking  can  be  established.1 

This  architecture  will  have  an  immediate  impact  in  diagnosis  systems  and  in 
the  interpretation  of  sense  data  where  the  contributions  of  both  qualitative  and 
quantitative  data  are  required,  and  anytime  response  is  essential. 

Autonomous  planning  systems  are  also  likely  candidates  to  benefit  from  such 
architecture.  However,  these  systems  not  only  require  the  ability  to  reason  with 
defaults,  evidence,  and  actions,  but  they  also  require  the  ability  to  reason  about 
what  is  desirable  and/or  difficult,  according  to  the  consequences  and  costs  of 
these  actions.  The  trade-offs  between  actions,  chances,  and  pay-offs  have  been 
studied  thoroughly  in  decision  theory  [124,  64],  and  their  application  to  AI  has 
been  emphasized  recently  [26,  125,  59].  In  almost  all  formalisms  proposed  judge¬ 
ments  about  the  likelihood  of  events  is  quantified  by  numerical  probabilities  and 
judgements  about  the  desirability  of  action  consequences  are  quantified  by  util¬ 
ities,  thus  they  are  subject  to  the  same  criticisms  (of  the  numerical  approach) 
that  motivated  the  work  in  this  thesis.2  An  extension  of  the  ranking  formalism 
to  include  a  qualitative  abstraction  of  utilities  should  bring  the  computational 
and  representational  benefits  that  the  approach  in  this  dissertation  presents  for 
reasoning  with  default  information. 

Blunter  [61]  adapts  the  algorithms  in  [97]  for  computing  with  Spohn’s  OCF. 

2See  [106]  for  an  approach  to  default  reasoning  based  on  utilities,  and  [126]  for  a  development 
of  better  representation  languages. 
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6.2.3  Learning 

By  virtue  of  its  probabilistic  semantical  basis,  the  framework  proposed  in  this 
dissertation  establishes  a  connection  to  learning,  and  enables  us  to  ask  not  mei'ely 
how  to  reason  with  defaults  but  also  where  default  rules  come  from.  It  lays  the 
theoretical  foundations  for  learning  systems  that  coherently  extract  conditional 
rules  from  raw  observations,  integrate  them  with  rules  transmitted  linguistically, 
and  further  refine  them  to  adapt  to  new  changes  in  the  environment.3 

In  Bayesian  belief  networks,  the  learning  task  separates  nicely  into  two  sub¬ 
tasks:  Learning  the  parameters  of  the  network  (i.e. ,  the  conditional  probabilities) 
for  a  given  network  topology  and  identifying  the  topology  itself.  These  subtasks 
are  clearly  not  independent  because  the  set  of  parameters  needed  depends  largely 
on  the  topology  assumed,  and  conversely,  the  structure  of  the  network  is  formally 
dictated  by  the  joint  distribution.  Yet  it  is  more  convenient  to  execute  the  learn¬ 
ing  process  in  two  separate  phases:  structure  learning  and  parameter  learning.4 
The  topic  of  parameter  learning  is  fairly  well  covered  in  the  literature  on  estima¬ 
tion  techniques  [97,  119].  The  task  of  structure  learning  is  a  more  challenging 
one,  and  has  received  recent  attention  in  [97,  39,  103],  where  methods  and  al¬ 
gorithms  usually  introduce  assumptions  of  causality,  and  a  preference  towards 
simple  structures.  Given  the  relation  between  stratification  and  Bayes  networks, 
it  is  to  be  expected  that  these  methods  can  be  adapted  to  the  frameworks  of 
Chapter  4  and  5.  The  parameters  in  a  Bayes  network  correspond  to  the  strength. 
6  of  the  rules,  and  the  topology  of  the  network  corresponds  to  the  underlying 
graph  structure  T.  The  do  operator  introduced  in  Section  5.4  can  be  then  used 
to  uncover  causal  relations  through  the  selective  and  controlled  manipulation  of 
events. 


3In  this  section  we  regard  learning  as  the  task  of  finding  a  generic  model  of  empirical  data. 
In  other  words,  learning  can  be  thought  of  as  the  process  of  acquiring  an  effective  internal 
representation  for  the  persistent  constraints  in  the  world,  i.e.,  generic  facts  and  rules. 

4The  advantages  are  discussed  in  [97,  Chapter  8]. 
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APPENDIX  A 


Proofs 


Since  some  of  the  proofs  below  refer  to  unconfirmable  sets,  we  recall  their  defini¬ 
tion: 

Definition  2.16  A  set  A  —  D  U  S  is  said  to  be  unconfirmable  if  one  of  the 
following  conditions  is  true: 

1.  If  D  is  nonempty,  then  there  cannot  be  a  defeasible  sentence  in  D  that  is 
tolerated  by  A. 

2.  If  D  is  empty  (i.e.,  A  —  S)  then  there  must  be  a  strict  sentence  in  S  which 
is  non  tolerated  by  A. 

Essentially,  unconfirmable  sets  are  those  that  violate  the  conditions  of  Theo- 
rem  2.4  below. 

Theorem  2.4  Let  A  =  DUS'  be  a  non-empty  set  of  defeasible  and  strict  sentences. 
A  is  p-consistent  iff  every  non-empty  subset  A'  =  D'  U  S'  of  A  complies  with  one 
of  the  following: 

1.  If  D'  is  not  empty,  then  there  must  be  at  least  one.  defeasible  sentence  in  D' 
tolerated  by  A'. 

2.  If  D'  is  empty  (i.e.,  A'  =  S'),  each  strict  sentence  in  S'  must  be  tolerated 
by  S'. 

Proof  of  the  only  if  part:  We  want  to  show  that  if  there  exists  a.  non-empty 
subset  of  A  which  is  unconfirmable,  then  A  is  not  p-consistent.  The  proof  is 
facilitated  by  introducing  the  notion  of  quasi- conjunction  (see  [2]):  Given  a  set 
of  defeasible  sentences  D  =  {fi  — >  tpi, ...  ,<fn  — >  ipn}  the  quasi- conjunction  of  D 
is  the  defeasible  sentence, 

(7(D)  =  [fa  V  . . .  V  4>n]  ->  [(cf) i  D  xff)  A  ...  A  (4>n  D  ifn )]  (A.l) 

The  quasi-conjunction  (7(D)  bears  interesting  relations  to  the  set  D.  In  partic¬ 
ular,  if  there  is  a  defeasible  sentence  in  D  which  is  tolerated  (by  D)  by  some 
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model  co,  C(D)  will  be  verified  by  c o.  This  is  so  because  the  verification  of  at 
least  one  sentence  of  D  by  u>  guarantees  that  the  antecedent  of  C(D )  (i.e.  the 
formula  V  . . .  V  <j>n]  in  Eq.  (A.l))  is  satisfied  by  w,  and  the  fact  that  no  sen¬ 
tence  in  D  is  falsified  guarantees  that  the  consequent  of  C(D)  (i.e.  the  formula 
[(cj>i  D  tpi)  A  ...  A  (4>n  D  ipn)]  in  Eq.  (A.l))  is  also  satisfied  by  u>.  Similarly, 
if  at  least  one  sentence  of  D  is  falsified  by  a  model  lo',  its  quasi-conjunction  is 
also  falsified  by  to'  since  in  this  case,  the  consequent  of  C(D)  is  not  satisfied 
by  u/  (at  least  one  of  the  material  implication  in  the  conjunction  is  falsified  by 
a/).  Additionally,  let  UP(C(D ))  =  1  —  P(C(D ))  (the  uncertainty  of  C(D ))  where 
P(C(D))  is  the  probability  assigned  to  the  quasi-conjunction  of  D  according  to 
Eq.  (2.4),  then,  it  is  shown  in  [1]  that  the  uncertainty  of  the  quasi-conjunction  of 
D  is  less  or  equal  to  the  sum  of  the  uncertainties  of  each  of  the  sentences  in  D , 
i.e.  UP(C(D ))  <  T)i(l  —  P{ibi\cf>i))  where  the  sum  is  taken  over  all  </>;  — *■  ipi  in  D. 

We  are  now  ready  to  proceed  with  the  proof.  Let  A'  =  D'  U  S'  be  a  nonempty 
subset  of  A  where  D'  is  a  subset  of  D  and  S'  is  a  subset  of  S.  If  A'  is  unconfirmable 
then  one  of  the  following  cases  must  occur: 

Case  1.-  S'  is  empty  and  D'  is  unconfirmable1.  In  this  case,  the  quasi-conjunction 
for  D'  is  not  verifiable;  from  Eq.  (2.4),  we  have  that  for  any  P  which  is  proper 
for  C(D'),  P(C(D'))  =  0  and  Up(G(D'  j)  =  1.  It  follows,  by  the  properties  of  the 
quasi- conjunction  outlined  above  that  Xa(l  —  P  HAWi))  over  A  ~ A  in  D'  is 
at  least  1.  If  the  number  of  sentences  in  D'  is  n  >  1,  then, 


i=l 

n 

Y.  mw.) 


i=  1 


>  l 

<  n  —  1 


(A.2) 

(A.3) 


which  implies  that  at  least  one  sentence  in  D'  has  probability  smaller  than  1  —  A 
Hence,  it  is  impossible  to  have  P('0'|<^/i)  >  1  —  e,  for  every  e  >  0,  for  every 
defeasible  sentence  (f)'i  €  D' .  Thus,  A  is  p-inconsistent. 


Case  2.-  D'  is  empty.  If  S'  is  unconfirmable,  then  there  must  be  at  least  one 
sentence  <f>'  =4»  o'  G  S'  such  that  no  model  u>'  verifies  <j>'  =>  a'  without  falsifying 
another  sentence  in  S'.  We  show  by  contradiction  that  there  is  no  probability 
assingment  P  to  the  sentences  in  S'  such  that  P(a\(p)  =  1  for  all  (f>  =>  a  €  S' 
and  P  is  proper  for  every  sentence  in  S'.  Assume  there  exists  such  a  P.  From 
Eq.  (2.4) 


P(cr\<f>)  = 


_ Ec^a<t  P(u) _ 

Ew|=<^A<7 


(A.4) 


^his  case  is  covered  by  Theorem  1.1  in  [2]. 
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which  immediately  implies  that  if  a  model  u>"  falsifies  any  sentence  ft'  =>  a"  6  S' 
(including  ft  a1),  then  P(lo")  must  be  zero,  else  P{a"\ft')  will  not  equal  1. 
Thus,  P(u/)  =  0  for  every  to1  verifying  ft  =>  cr'  since  u>'  must  falsify  another 
sentence  in  S1.  But  then  either  P(a'\ft)  =  0,  or  P  is  not  proper  for  ft  =>  a':  A 
contradiction.  We  conclude  that  if  S'  is  unconfirmable  then  A  is  p-inconsistent. 

Case  3.-  Neither  D'  nor  S'  are  empty  and  A'  is  unconfirmable.  That  is,  either 
the  quasi-conjunction  C(D')  is  not  verifiable  or  every  ui'  that  verifies  a  defeasible 
sentence  in  D'  falsifies  at  least  one  sentence  in  S'.  The  first  situation  will  lead 
us  back  to  case  1  while  the  second  to  a  contradiction  similar  to  case  2  above.  In 
either  case,  A  is  not  p-consistent. 

Proof  of  the  (/part:  Assume  that  every  non-empty  subset  of  A  =  D  U  S  com¬ 
plies  with  the  conditions  of  Theorem  2.4.  Then  the  following  two  constructions 
are  feasible: 

•  We  can  construct  a  finite  “nested  decreasing  sequence”  of  non-empty  sub¬ 
sets  of  A,  namely  Ai,...,Am,  (A  =  Ai),  and  an  associated  sequence  of 
truth  assignments  uq , . . . ,  u>m  such  that  col  satisfies  all  the  sentences  in  A ,• 
and  verifies  at  least  one  one  defeasible  sentence  in  A;,  and  the  sets  in  the 
sequence  present  the  following  characteristics: 

1.  Aj+i  is  the  proper  subset  of  A;  consisting  of  all  the  sentences  of  Di 
not  verified  by  uy,  for  i  —  1, . . . ,  m  —  1,  plus  the  sentences  in  S. 

2.  All  sentences  in  Dm  are  verified  by  u>m. 

•  We  can  construct  a  sequence  uim+i ,  •  •  • ,  that  will  confirm  Am+i  =  S. 
That  is,  the  sequence  a>m+1, . .  . ,  ton  will  verify  every  sentence  in  S  without 
falsifying  any.  We  will  associate  with  wm+i, . . .  ,tun  the  “nested  decreasing 
sequence”  Am+i,...,An  where  A;+1  is  the  proper  subset  of  A,-  consisting 
of  all  the  sentences  of  Si  not  verified  by  ujt  for  i  =  m  +  l, ...  ,n. 

We  can  now  assign  probabilities  to  the  truth-assignments  lo\,  . . .  ,LOn  in  the 
following  way: 

For  i  =  1 , . . . ,  n  —  1 


P(cot)  =  p-\l-e) 

and 


PM  =  £n_1 


(A. 5) 


(A.6) 
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We  must  show  that,  in  fact,  every  y>  — »  in  D  obtains  P(if\ip)  >  1  —  e  and 
that  every  cf>  a  in  S  obtains  P(s)  =  1.  Since  every  <p>  — •>  ?/>  is  verified  in  at  least 
one  of  the  member  of  the  sequence  Ai, . . . ,  An,  using  Eq.  (2.4)  we  have  that  for 
i  <  n: 


e) 


P(4’,\^)  >  _  £)  +  e.-(i  _  e)  + 


n—1 


=  1—6 


(A.7) 


and  P(iJ)n\<f>n)  =  1  if  it  is  only  verified  b}'  the  last  model  when  S  is  originally  empty. 
Finally,  since  no  <j>  =>  a  in  S  is  ever  falsified  by  the  sequence  of  truth  assignments 
u>i , . . . ,  u)n  and  each  and  every  (j)  =$>  a  is  verified  at  least  once,  it  follows  from 
Eq  (2.4)  and  the  process  by  which  we  assigned  probabilities  to  uq , . . . ,  un  that 
indeed  P(a\f)  =  1  for  every  <f>  =>■  o  £  S.  □ 

Corollary  2.5  A  =  D  U  S  is  p-consistent  iff  we  can  build  an  ordered  partition  of 
D  =  [Di,D2,  . . . ,  Dn]  where: 

1.  For  all  1  <  i  <  n,  each  sentence  in  Di  is  tolerated  by  S  Ujl”+1  Dj. 

2.  Every  sentence  in  S  is  tolerated  by  S . 

Proof:  If  A  is  p-consistent,  by  Theorem  2.4  we  must  be  able  to  find  a  tolerated 
defeasible  sentence  in  every  subset  A'  =  D'  U  S'  (of  A)  where  D'  is  nonempty,  and 
it  follows  that  the  construction  of  the  ordered  partition  D  =  [Di,  D2, . . . ,  Dn\  is 
possible.  Similarly,  by  Theorem  2.4,  if  A  is  p-consistent  every  strict  sentence  in 
S  must  be  tolerated  by  S.  On  the  other  hand,  if  both  conditions  in  the  corollary 
hold,  we  use  the  set  of  models  (uq)  that  renders  the  sentences  in  each  D{  tolerated 
by  the  set  S  U)l”+1  Dj  to  construct  a  high  probability  model  for  A,  following  the 
probability  assignments  of  Eqs.  A. 5  and  A. 6.  □ 

Theorem  2.8  If  A  is  p-consistent,  A  p-entails  — >  ip'  iff  ft  — >  -if//  is  substan¬ 

tively  inconsistent  with  respect  to  A. 

Proof  of  the  only  if  part:  (If  A  p-entails  ft  —>  ft  then  ft  — >  ->ft  is  substantively 
inconsistent  with  respect  to  A.)  Let  A  \=p  ft  —*  ft.  From  the  definition  of  p- 
entailment  (Def.  2.7),  for  all  e  >  0  there  exists  a  8  >  0  such  that  for  all  P  €  V& 
which  are  proper  for  A  and  ft  — >  ft ,  Pftft\ft)  <  z.  This  means  that  for  all 
proper  probability  assignments  P  for  A  and  ft  — >  ft  2,  the  sentence  ft  — >•  ->ft 
gets  an  arbitrarily  low  probability  whenever  all  defeasible  sentences  in  A  can  be 
assigned  arbitrarily  high  probability  and  all  strict  sentences  in  A  can  be  assigned 
probability  equal  to  1.  Thus  ft  — >  ~>ft  is  substantively  inconsistent  with  respect 
to  A. 


2Note  that  from  the  definition  of  p-entailment  there  must  exists  at  least  one  P  proper  for 
A  and  ft  — »■  ft . 
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Proof  of  the  if  part:  (  If  ft  — »  ->ft  is  substantively  inconsistent  with  respect 
to  A  then  A  p-entails  ft  — »  if'.)  Let  ft  — >  ->ft  be  substantively  inconsistent 
with  respect  to  A.  From  Theorem  2.4,  we  know  that  there  must  be  a  subset  A' 
of  A  U  {ft  — >  -'ft}  that  is  unconfirmable.  Furthermore,  since  A  is  p-consistent, 
A'  =  A7/U  {ft  — >  -'ft}-  Let  Vs  stand  for  the  set  of  probability  distributions  that 
are  proper  for  A  and  ft  — >  -'ft  such  that  if  P  €  Vs,  then  P{a\ft)  =  1  for  all 
<j>  =>  a  in  A  3.  We  will  consider  two  cases  depending  on  the  structure  of  A": 

Case  1.-  A"  does  not  include  any  defeasible  sentences.  From  Theorem  2.4,  we 
know  that  ft  — ->  ft  cannot  be  tolerated  by  A"  for  otherwise  A1  wouldn’t  be 
inconsistent.  It  follows  from  Eq.  2.4  (probability  assignment)  that  Pftft \ft)  =  0 
for  all  P  6  Vs-  Thus,  P(ft\ft)  —  1  in  all  P  €  Vs  and  since  any  probability 
distribution  that  is  in  Va.e  must  also  belong  to  Vs,  it  follows  from  the  definition 
of  p-entailment  that  A  f=p  ft  — >  ft. 

Case  2.-  A”  includes  defeasible  and  a  possible  empty  set  of  strict  sentences.  Since 
A”  U  {ft  — *  ~'ft}  is  unconfirmable,  we  have  from  the  proof  of  Theorem  1,  that 
for  all  probability  distributions  P  e  Vs- 

£  UP(tp  ^ft)  +  uftft  ->  ^ft)  >  1  (A. 8) 

v-+^eA" 

which  implies  that 

£  up{ P  -»  V’)  >  1  -  Up(ft  -^ft)  =  C/pCv?'  ->  ft)  (A. 9) 

i/>€A 

Since  Up{(p  — *  f>)  =  1  —  P(cp  — »  i/>)  and  /7P ( — >  '(/>')  =  1  —  P(c^/  — >  '0'),  Eq.  (A. 9) 
says  that  1  —  P(<^'  — >  ft)  can  be  made  arbitrarily  small  by  requiring  the  values 
1  —  P(ip  — >  0)  for  (p  — >  ft  G  D  to  be  sufficiently  small  and  the  values  of  P(a\ft) 
to  be  1  for  all  0  o  G  S.  This  is  equivalent  to  say  that  A  |=p  ft  — >  ft.  □ 

Theorem  2.10  If  A  =  D  U  S  is  p-consistent,  A  strictly  p-entails  ft  =>  a'  iff 
S  U  {ft  —>  True}  is  p-consistent  and  there  exists  a  subset  S'  of  S  such  that 
ft  =>  -><7  is  not  tolerated  by  S' . 

Proof  It  follows  from  the  proof  of  Theorem  2.8  (see  case  1  of  the  if  part).  □ 

Lemma  A.l  TEST-CONSISTENCY  constitutes  a  decision  procedure  for  testing 
the  p-consistency  of  a  set  A  of  conditional  sentences. 

3We  know  that  Vs  is  not  empty  since  A  U  {ft  — >  True}  must  be  p-consistent  according 
to  Def.  2.6.  In  the  case  where  A  does  not  contain  any  strict  sentences,  Vs  simply  denotes  all 
probability  distributions  that  are  proper  for  A  U  {ft  — +  True}. 
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Proof:  If  the  procedure  stops  at  either  line  4  or  line  9  an  unconfirmable  subset  is 
found,  and  by  Theorem  2.4  the  set  of  sentences  is  p-inconsistent.  If  on  the  other 
hand,  the  procedure  reaches  line  10,  the  order  in  which  the  sentences  are  tolerated 
can  be  used  to  build  a  high  probability  model  for  A  using  the  construction  (of  the 
“nested  decreasing  sequence”)  in  the  proof  of  Theorem  2.4,  and  A  must  therefore 
be  p-consistent.  □ 

Theorem  2.13  The  worst  case  complexity  of  testing  consistency  (or  entailment) 
is  bounded  by  (PS  x  — f  |5|)]  where  \D\  and  | S’ |  are  the  number  of  defeasi¬ 
ble  and  strict  sentences  respectively,  and  VS  is  the  complexity  of  propositional 
satisfiability  for  the  material  counterpart  of  the  sentences  in  the  database. 

Proof:  Given  that  TEST_CONSISTENCY  constitutes  a  decision  procedure  for 
p-consistency  (see  Lemma  A.l  above),  a  complexity  bound  for  this  procedure  will 
be  an  upper  bound  for  the  problem  of  deciding  p-consistency.  To  assess  the  time 
complexity  of  TEST_CONSISTENCY,  note  that  the  WHILE-loop  of  line  6  will 
be  executed  l^l  times  in  the  worst  case,  and  each  time  we  must  do  at  most  VS 
work  to  test  the  satisfiability  of  S  —  .s;  thus,  its  complexity  is  |5'|  x  VS.  In  order 
to  find  a  tolerated  sentence  d  :  f  — ►  xf  in  D' ,  we  must  test  at  most  \D'\  times 
(once  for  each  sentence  d  €  D')  for  the  satisfiability  of  the  conjunction  of  f>  A  ?/> 
and  the  material  counterparts  of  the  sentences  in  5”  U  D'  —  {c/}.  However,  the  size 
of  D'  is  decremented  by  at  least  one  sentence  in  each  iteration  of  the  WIilLE- 
loop  in  line  (2),  therefore  the  number  of  times  that  we  test  for  satisfiability  is 
\D\  +  |D|  —  1  +  \D\  —  2  +  . . .  +  1  which  is  bounded  by  ^y-.  Thus,  the  overall  time 
complexity  is  0[VS  x  +  |S'|)].  □ 

Theorem  2.24  If  the  set  A  is  acyclic  and  of  Horn  form,  wp2-irnplication  can  be 
decided  in  polynomial  time. 

Proof:  The  proof  of  this  theorem  requires  a  short  review  of  some  results  from  [25], 
since  the  procedure  for  deciding  wp2-implication  is  based  on  one  of  the  algorithms 
presented  in  that  paper.  Given  a  set  Ti  of  Horn  clauses,  Dowling  and  Gallier  define 
an  auxiliary  graph  Gn  to  represent  the  set  Ti,  and  reduce  the  problem  of  finding 
a  truth  assignment  satisfying  the  sentences  in  Ti,  to  that  of  finding  a  pebbling  on 
the  graph  using  a  breadth  first  strategy.  We  first  describe  these  concepts  more 
precisely  and  then  apply  them  to  the  problem  at  hand: 

Definition  A. 2  ([25])  Given  a  set  Ti  of  Horn  clauses,  Gn  is  labeled  directed 
graph  with  N  +  2  nodes  (a  node  for  each  propositional  letter  occurring  in  Ti,  a 
node  for  true  and  a  node  for  false)  and  a  set  of  labels  [M],  It  is  constructed  with 
i  taking  values  in  [M]  as  follows  depending  of  the  form  of  the  ith  Horn  formula 
in  Ti\ 
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1.  If  it  is  a  positive  literal  q,  there  is  an  edge  from  true  to  q  labeled  i. 

2.  If  it  is  of  the  form  ->pi  V ...  V  ->pn,  there  are  n  edges  from  pi, .  ■ .  ,pn  to  false 
labeled  i. 

3.  If  it  is  of  the  form  ~<pi  V  ...  V  ->pn  V  q,  there  are  n  edges  from  pi, . . .  ,pn  to 
q  labeled  i. 

A  node  q  in  Gn  can  be  pebbled  if  and  only  if  for  some  label  i,  all  sources  of 
incoming  edges  labeled  i  are  pebbled.  The  node  true  is  considered  to  be  pebbled. 
A  pebbled  path  is  a  path  on  the  graph  such  that  all  its  nodes  are  pebbled.  Given 
the  correspondence  between  a  Horn  rule  hi  and  the  set  of  i-labeled  edges  in  the 
graph  we  are  going  to  use  both  terms  (edge  and  rule)  indistinctively.  Thus, 
eliminating  a  rule  hi  should  be  understood  as  removing  the  set  of  i-labeled  edges 
from  the  graph.  Similarly  a  pebbled  rule  will  indicate  that  the  associated  nodes 
in  the  graph  are  pebbled  etc.  A  graph  Gn  is  considered  to  be  completely  pebbled, 
if  and  only  if  all  nodes  that  remain  unpebbled  have  at  least  one  incoming  edge 
with  a  source  that  cannot  be  pebbled;  i.e.,  there  cannot  be  a  pebbled  path  from 
true  to  that  node. 

Lemma  A. 3  ([25])  Let  Li  be  a  set  of  Horn  clauses  and  let  Gn  be  its  associated 
graph,  Li  is  unsatisfiable  iff  there  is  a  pebbling  in  Gn  from  true  to  false 

This  lemma  and  the  existence  of  an  0(N2)  algorithm  for  deciding  satisfiability 
are  proven  in  [25]  ( N  represents  the  number  of  occurrences  of  literals  in  the  set 
of  clauses).  We  now  prove  a  couple  of  lemmas  regarding  a  polynomial  procedure 
for  deciding  whether  a  conditional  sentence  x  is  weakly  inconsistent  with  respect 
to  a  set  A.  Recall  that  by  the  definition  of  wp2-implication  (Def.  2.22)  once  we 
have  identified  a  sentence  as  weakly  inconsistent,  its  negation  is  wp2-implied.  The 
lemma  below  shows  a  simple  test  for  deciding  whether  a  particular  horn  sentence 
h  is  essential  for  the  unsatisfiability  of  some  set  7i: 

Lemma  A. 4  Let  Gn  he  an  acyclic  graph  representing  the  set  Li  of  Horn  clauses. 
Assume  that  Li  is  unsatisfiable  and  that  Gn  is  completely  pebbled.  Let  h  £  Li  be 
a  Horn  clause  such  that  both  the  antecedent  and  consequent  of  h  are  pebbled  in 
Gn,  and  assume  that  there  is  a  pebbled  path  from  the  consequent  of  h  to  false. 
Then  there  exists  a  nonempty  subgraph  G'n  of  Gn  containing  h  such  that  G'n  is 
unsatisfiable  but  G'n  —  {h}  is  satisfiable. 

We  show  the  correctness  of  this  lemma  by  constructing  the  graph  G'H .  The  idea 
is  to  eliminate  from  Gn  all  the  alternative  pebbled  paths  to  false,  and  leave  G'-H 
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with  only  the  path  that  goes  through  the  rule  h,  together  with  those  necessary  to 
render  this  path  pebbled.  First,  we  select  one  pebbled  path  from  true  to  false 
that  goes  through  h  (by  the  assumptions  of  the  lemma,  we  know  that  there  is  at 
least  one.)  Next,  we  eliminate  any  rule  that  reaches  false  directly  (i.e.  of  form  2 
in  Def.  A. 2)  that  is  not  in  the  selected  path.  We  now  traverse  the  selected  path 
“backwards”  from  false  to  the  node  representing  the  consequent  of  h,  and  remove 
any  incoming  edges  are  not  necessary  to  render  this  path  pebbled.  Note  that  we 
can  guarantee  to  have  eliminated  alternative  paths  to  false.  The  only  possibility 
for  this  construction  to  fail  is  if  we  would  have  removed  some  paths  that  pebble 
the  antecedents  of  h  (in  which  case  G'n  would  be  satisfiable),  but  this  can  only 
happen  if  there  is  a  cycle  in  the  graph  involving  h,  and  this  possibility  is  ruled 
out  by  the  assumptions  of  acyclicity.  Since  to  complete  the  pebbling  of  a  graph  is 
no  worse  than  testing  for  satisfiability,  and  searching  for  a  pebbled  path  from  one 
node  to  another  can  also  be  done  by  a  breadth  first  search  algorithm  it  follows 
that  the  test  outlined  in  Lemma  A. 4  can  be  performed  in  polynomial  time.  This 
test  constitutes  the  basis  of  a  procedure  for  deciding  weakly  inconsistency: 

Lemma  A. 5  Given  a  set  A  which  of  Horn  form  and  acyclic,  to  decide  whether 
a  sentence  is  weakly  inconsistent  with  respect  to  A  requires  polynomial,  time. 

Given  a  set  A  and  a  sentence  x,  we  first  apply  the  consistency  test  of  Section  2.4 
to  A  U  {a;}  in  order  to  find  an  unconfirmable  subset  Au.  If  none  can  be  found 
or  the  sentence  x  does  not  belong  to  Au,  we  can  assert  that  x  is  not  weakly 
inconsistent  with  respect  to  A.  In  the  first  case  A  is  consistent,  and  in  the 
second  case  x  does  not  belong  to  any  inconsistent  subset  of  Au  {i}.  Once  Au  is 
found  (and  x  G  A„),  we  systematically  complete  the  pebbling  of  the  associated 
graph  Gau  starting  from  each  one  of  the  antecedents  of  the  sentences  in  Au.  If 
in  one  of  these  pebblings,  the  sentence  x  complies  with  the  requirements  of  the 
test  outlined  in  Lemma  A. 4,  then  x  is  weakly  inconsistent.  Note  that  all  the 
steps  involved  require  polynomial  time  with  respect  to  N  (i.e.  the  number  of 
occurrences  of  literals  in  the  set  of  clauses),  and  since  once  we  have  a  procedure 
for  deciding  whether  a  sentence  is  weakly  inconsistent  we  have  a  procedure  for 
wp2-implication  (see  Def.  2.22),  we  have  essentially  proven  Theorem  2.24.  □ 

We  remark  that  these  results  are  not  relevant  only  to  nonmonotonic  reasoning 
but  to  any  application  involving  propositional  entailment. 

Theorem  3.3  A  set  D  is  consistent  (in  the  sense  of  Def.  3.2)  iff  D  is  p- 
consistent. 

Proof:  By  Theorem  2.4,  if  D  is  p-consistent  then  by  there  exist  at  least  one 
tolerated  rule  in  every  nonempty  subset  D'  C  D.  It  follows  that  we  can  use  the 
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same  construction  used  in  the  proof  of  Theorem  2.4  (see  Eqs.  A. 5  and  A. 6),  and 
build  a  probability  function  parameterized  on  e  such  that  for  each  </q  — +  ifi  G  D, 

limPe  (?/>*•  |  <£>,•)  =  1  (A. 10) 

(see  Eq.  A. 7),  and  it  follows  that  D  is  consistent  according  to  Definition  3.2. 

On  the  other  hand,  by  Theorem  2.4,  if  D  is  not  p-consistent,  then  there  exist 
a  subset  D'  where  no  default  rule  is  tolerated.  We  show  that  if  this  is  the  case, 
there  cannot  be  an  admissible  ranking  for  D.  Following  Proposition  3.8  D  is 
inconsistent  (according  to  Definition  3.2),  and  the  other  direction  of  Theorem  3.3 
holds.  We  reason  by  contradiction:  Assume  that  there  is  no  tolerated  rule  in 
D'  C  D  and  there  is  an  admissible  ranking  k'  for  D.  Let  us  define  a  characteristic 
possible  world  for  a  rule  to  be  a  possible  world  with  minimal  ranking  verifying 
the  rule.  Since  there  is  no  tolerated  rule  in  D we  know  that  any  characteristic 
possible  world  uq  for  rule  rq  G  D'  must  falsify  another  rule  r2  G  D' .  By  the 
admissibility  of  k'  the  following  must  hold 

k(u2)  <  n'(iOi)  (A. 11) 

where  uq  is  a  characteristic  possible  world  for  r2.  By  the  same  token,  u>2  must 
falsify  another  rule  in  D' ,  say  7’3,  and  we  can  insert  k'(u>3)  4  in  the  chain  of 
Eq.  A. 11: 

k'(uj3)  <  k'(u>2)  <  fc'(uq)  (A. 12) 

We  can  continue  to  expand  the  chain  in  this  fashion  and  get, 

k\ tun)  <  K,'(wn_ x)  <  . . .  <  k\u 2)  <  /c'(cui)  (A. 13) 

Note  that  if  at  any  point  in  the  construction  of  this  chain,  a  possible  world 
falsifies  a  rule  that  has  a  characteristic  possible  world  in  the  chain,  we  arrive  at 
a  contradiction  since  by  the  admissibility  of  k'(u')  <  k'(u> ")  but  since  both 
lo1  and  oj"  are  characteristic  possible  worlds  of  the  same  rule  it  must  be  that 
«'(u/)  =  k'(u").  Moreover,  given  that  D!  is  finite  we  are  bound  to  encounter  such 
contradiction.  □ 

Theorem  3.6  A  PPD  consequence  relation  satisfi.es  the  Logic,  Cumulativity, 
Cases  and  Rational  Monotony  rules  of  inference. 

Proof:  Note  that  if  tp  P  if  then  P(fi\<p)  =  1.  Thus,  each  PPD  consequence 
relation  satisfies  the  Logic  rule.  From  elementary  probability  equivalences, 

P(l \g>)  —  P{~j\fi  A  (f  )P{fi\Lp)  +  P( 7|->-0  A  <£>) P (-)■(/> | tp);  (A. 14) 

W3  is  a  characteristic  possible  world  for  r3. 
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thus,  lim^o  PsilW)  approaches  lim^o  Pe( as  hro£_o  Pe{fi \<p)  approaches 
1.  Hence,  each  such  relation  satisfies  Cumulativity.  Again,  from  elementary 
probability  equivalences  we  have  P{^\tp  V  fi)  =  P(7|<p)  +  P( 'j\4>)  —  P(q \ip  A  fi). 
Thus,  lime^o  Pe{')\it>  V  %jj)  >  lirm_,o  Ps(,y\lp)  +  lim«->-o  Psillfi)  —  1,  and  it  follows 
that  a  PPD  consequence  relation  also  satisfies  Cases.  Finally,  since  P(~eip \<p)  = 
P(-i</)|7Acp)P(7|!^)  +  P(-i^>|-i7Acp)P(-i7|^),  if  lim£_o  Pe(->^\(p)  =  0  (i.e.  ip  |~  ip) 
and  lim£_0  Pe( 7|<£>)  ^  0,  it  must  be  the  case  that  lim£_o  Pe{~'‘il> \l  A  ip)  =  0  (i.e. 
7  A  ip  ip)  and  it  follows  that  a  PPD  consequence  relation  also  satisfies  Rational 
Monotony.  □ 

Theorem  3.6  Every  PPD  consequence  relation  can  be  represented  as  a  ranked 
preferential  model,  and  every  ranked  preferential  model  with  a  finite  non-empty 
state  space  can  be  represented  as  a  PPD  consequence  relation. 

Proof:  We  have  shown  (Theorem  3.5)  that  each  PPD  consequence  relation  sat¬ 
isfies  Logic,  Cumulativity,  Cases  and  Rational  Monotony ,  and  hence  by  the  rep¬ 
resentation  theorem  in  [74]  it  can  be  represented  as  a  ranked  preferential  model. 

For  the  converse  part,  we  employ  essentially  the  same  construction  as  used  in 
Lemma  31  of  [74],  except  we  take  pains  to  ensure  the  probability  functions  are 
polynomial  in  e.  Suppose  we  are  given  a  ranked  preferential  model  with  n  ranks, 
denoted  by  R\, ... ,  R.n.  Let  be  the  number  of  states  in  Rt ,  for  1  <  i  <  n.  For 
each  state  sfi  define 


[  [1  —  £  —  —  en  1]/«i  for  .s  in  Rx 

I  £*  1  /ai  for  s  in  Ri,  2  <  i  <  n 

It  is  easy  to  see  that  this  probability  measure  on  states  will  yield  a  PPD  with  the 
same  consequence  relation  as  the  given  ranked  preferential  model.  □ 

Theorem  3.10  Given  a  consistent  D ,  <j>  \y  a  iff  D  |=p  <f>  — *•  a. 

Proof:  We  recall  the  definition  of  p-entailment  (Def.  2.7).  Given  a  positive  real 
number  £,  we  say  that  a  probability  measure  P  £-satisfies  a  default  rule  <p 
if  P(fi\ip)  >  1  —  £.  According  to  Definition  2.7,  a  default  f  —»  a  is  p-entailed 
by  a  set  D  if  for  every  £  >  0  there  exists  a  6  >  0  such  that  every  probability 
measure  that  ^-satisfies  each  rule  in  D  will  £-satisfy  f  —>  a.  Now  suppose  that 
<p  -4  <7  is  p-entailed  by  a  set  D.  Let  £  >  0  be  arbitrary,  and  let  8  be  such  that 

5For  technical  reasons,  Lehmann  and  Magidor  [74]  define  ranked  preferential  models  in  terms 
of  a  set  of  states  S,  and  a  function  l  mapping  states  s  6  S'  to  possible  worlds  w  (E  A  state  s 
will  satisfy  a  formula  <p  if  and  only  if  l(s)  (=  ip.  For  the  purposes  of  this  proof  we  define  P£  as 
a  probability  measure  on  a  set  of  states,  and  define  P£(w)  as  P£  H  =  Et(s)=»  F£(s).  The  rest 
of  our  definitions  and  results  hold  without  further  modifications. 
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if  P  ^-satisfies  each  default  in  D,  then  it  e-satisfies  <j>  a.  Let  P7  be  a  PPD 
admissible  with  D.  Then  for  each  default  pi  — >  ipt  in  D,  P1  ^-satisfies  pi  — *  ipi 
for  sufficiently  small  7.  Since  D  is  finite,  we  can  find  a  single  constant  K,  such 
that  If.  ^-satisfies  every  member  of  D  for  7  <  K.  Thus,  P1  e-satisfies  cp  —>  a  for 
sufficiently  small  7.  Since  e  is  arbitrary,  we  conclude  that  a  is  probabilistically 
entailed  by  D  given  <p. 

For  the  converse,  suppose  that  <f>  —>  a  is  not  p-entailed  by  D.  From  Theo¬ 
rem  2.8,  either  D  U  {</>  — >  — >cr}  is  consistent,  or  D  U  {<f>  — >  True }  is  inconsistent. 
Suppose  first  that  D  U  {p  — >  True}  is  inconsistent.  Since  D  is  consistent,  the 
construction  in  the  proof  of  Theorem  2.4  can  be  used  to  obtain  a  PPD  admis¬ 
sible  with  D.  Since  D  U  {(j>  — »  True }  is  inconsistent,  this  PPD  does  not  satisfy 
(j)  — >  True ,  and  hence  <f>  a  cannot  hold  in  its  proper  consequence  relation.  As¬ 
sume  now  that  DU  {f>  — *  -><r}  is  consistent.  Once  again  using  the  construction  in 
the  proof  of  Theorem  2.4,  we  get  a  PPD  admissible  with  respect  to  Du{<j)  ~icr}- 
Clearly,  the  induced  consequence  relation  cannot  satisfy  <j>  cr.  Thus,  cr  is  not 
probabilistically  entailed  by  D  given  <f>.  □ 

Proposition  3.12  If  D  is  an  MC-set  then ,  for  all  defaults  r  :  (p  —>  ip  €  D, 
p  [y  ip  is  not  in  the  consequence  relation  induced  by  D  —  {ip  —1  ip)  . 

Proof:  Note  that  since  D  is  an  MC  set  then  D  —  { ipi  — 1  ipi }  must  also  be  an 
MC  set.  By  the  MC  set  property  (Def.  3.11)  for  each  rule  r\  :  <pi  — ►  ipi  E  D, 
there  exists  a  possible  world  1 o,  such  that  Ui  falsifies  only  n  and  no  other  rule  in 
D.  Thus,  K*(ui)  for  the  set  D  —  {ipi  — >  ipi}  must  be  equal  to  zero  (no  rule  in 
D  —  {( pi  — >  ipi}  is  violated  by  up)  and  thus  any  possible  world  u/  |=  ipt  A  ipi ,  must 
comply  with  k*(lo')  >  It  follows  that  pi  ipi  is  not  in  the  consequence 

relation  induced  by  D  —  {ipi  — >  ipi}.  □ 

In  order  to  show  Theorem  3.13,  we  require  the  following  lemma: 

Lemma  A. 6  Given  an  MC  set  D,  there  exists  at  least  one  Z  function  that  sat¬ 
isfies  eq.  3.19. 

Proof:  We  define  an  operator  O  on  Z  functions  by  O(Z)  =  Z',  where 
Z'(rP)  =  1  +  min[  Z{rf)\  1  <  1  <  n 

rjeD~ 

A  Z  function  satisfies  eq.  3.19  iff  it  is  a  fixed  point  of  O. 

Now  define  a  sequence  {Zn}  of  Z  functions  by 

Z\(r)  =  1  for  every  rule  r 
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and 

Zn+X  =  0(Zn). 

We  will  prove  that,  for  every  r,  the  sequence  {Zn(r)j  is  both  non-decreasing 
and  bounded,  and  hence  converges. 

First  we  show  that  Zn+i(r )  >  Zn(r)  for  every  r.  (We  will  abbreviate  this  as 
Zn+ 1  >  Zn.)  To  see  this,  note  that  if  Z  >  Z' ,  then  0(Z )  >  O(Z').  Clearly, 
Zi  >  Z\.  Applying  O  to  both  sides,  we  get  Z3  >  Z 2.  This  process  can  be 
repeated,  showing  (by  induction)  that  Zn+\  >  Zn,  for  every  n. 

Next,  we  use  the  partition  of  D  introduced  in  Corollary  2.5  to  show  that 
{Zn(r)}  is  bounded  above  for  every  r.  We  can  do  this  by  induction  on  the 
tolerance  set  to  which  r  belongs.  Clearly  it  is  true  for  rules  in  Do,  since  among 
the  verifiers  for  such  rules  are  possible  worlds  with  no  violations.  Assume  it  is 
true  for  r  in  D;.  Consider  a  rule  r  that  belongs  to  A+i-  There  must  be  at  least 
one  verifier  w  of  r  that  violates  only  rules  in  D{.  According  to  the  inductive 
hypothesis,  therefore,  Z{fj)  will  be  bounded  during  every  application  of 

O.  It  follows  that  {Zn(r)}  is  bounded. 

Since  {Zn}  converges,  we  can  define 

Z  =  lim  Zn . 

n— *oo 

Clearly  Z  will  be  a  fixed  point  of  O.  □ 

Relying  on  Lemma  A. 6  we  can  now  define  Z*  to  be  an  arbitrarily  chosen 
solution  to  Eq.  3.19.  It  will  follow  from  Theorem  3.13  that  this  solution  is  unique. 

Theorem  3.13  Given  an  MC  set  D,  Procedure  Z*_order  computes  the  function 
Z*  defined  by  Eqs.  3.20  and  3.21. 

Proof:  We  first  show  that  the  relevant  steps  in  Procedure  Z*_order  are  well 
defined.  By  the  assumption  that  D  is  consistent,  Dq  cannot  be  an  empty  set 
(steps  1  and  2):  There  must  be  at  least  one  rule  tolerated  by  D.  By  similar 
reasons,  f I  cannot  be  empty  in  each  iteration  of  the  loop  in  step  3.  By  consistency 
we  must  be  able  to  find  a  tolerated  sentence  in  each  nonempty  subset  of  D. 
Finally,  in  the  computation  of  Eq.  3.23,  since  u>  only  falsifies  rules  in  TZZ+ ,  all  Z 
for  these  rules  are  available. 

We  now  show  that  Z  =  Z*  for  rules  r0  €  Do.  Since  each  r0  is  tolerated  by 
D,  there  must  be  a  possible  world  u;0  (for  each  one  of  these  rules),  such  that  u>0 
verifies  r0  and  cu0  satisfies  D.  Thus,  each  one  of  these  possible  worlds  does  not 
falsify  any  rules  in  D,  and  k*(u>0)  =  0.  According  to  Eq.  3.21,  Z*(r0)  —  1  for 
those  rules  and  that  is  precisely  what  is  computed  in  step  2. 
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The  proof  proceeds  by  induction  on  the  iterations  of  loop  3;  we  show  that 
for  every  rule  r  <G  7 ZZ+ ,  Z(r)  =  Z*(r )  holds  as  an  invariant.  For  the  basis 
of  the  induction  consider  the  first  iteration:  Since  7ZZ+  =  D0,  then  for  every 
tq  6  A),  Z(r0)  =  Z*{tq )  holds  as  shown  above.  Our  objective  is  to  show  that 
this  ecpiality  holds  for  the  rules  inserted  into  7 ZZ+  at  step  3.(c).  Note  that  since 
all  the  values  k(u>)  for  u>  €  0  are  computed  from  Z*-values  of  rules  in  7 ZZ+  (step 
3.b  Ecp  3.23),  they  must  be  equal  to  k*(u).  We  define  a  characteristic  possible 
world  for  a  rule  r  to  be  the  possible  world  u>T  with  minimal  ranking  k*  verifying 
r.  Thus,  Z*{r )  =  min^^Ai/-  k*(u>)  +  1  =  n*(u>r)  +  1.  We  claim  that  is  a 

characteristic  possible  world  for  the  rules  outside  7 ZZ+  it  verifies.  Suppose  not: 
Assume  that  there  is  a  possible  world  uv  such  that  uv  verifies  some  rule  r'  which 
to*  also  verifies,  and  /c*(uv )  <  Note  that  t <v  cannot  belong  to  D  since 

the  value  of  k*(lo*)  is  minimal  with  respect  to  the  k*  of  possible  worlds  in  Q.  It 
follows  then  that  uv  must  falsify  a  rule  r"  £  7 ZZ+.  Let  c <v<  be  a  characteristic 
possible  world  for  r",  then 


K*(iOr'i )  <  K*(ur>)  (A. 15) 

Note  that  u>*  cannot  verify  r",  since  otherwise 

k*(lo *)  <  K*(ur.)  (A. 16) 

a  contradiction.  By  the  same  argument  as  above,  uv'  ^  and  therefore  it  must 
falsify  a  rule  r'"  0  TZZ+ .  if  uv»  is  a  characteristic  possible  world  for  r'"  we  have 
that 


k *(u;r///)  <  /c*(uv')  <  K*(ujri)  (A. 17) 

uv  cannot  verify  r'"\  otherwise  we  get  the  contradiction 

K*(ujri )  <  K*(lUJrii )  <  K*(u>ri )  (A. 18) 

and  if  u>*  verifies  r'"  we  get  the  contradiction  of  Eq.  A. 16.  uv»  cannot  belong  to 
D  and  therefore  it  must  falsify  another  rule  outside  'RZ+.  However,  given  that 
D  is  finite,  we  cannot  extend  the  “chain”  of  Eq.  A.  17  indefinitely,  and  therefore 
we  are  bound  to  get  a  contradiction  in  the  form  of  Eq.  A. 16  or  Eq.  A. 18.  Since 
our  only  assumption  was  that  k*(u;*)  is  not  a  characteristic  possible  world  for  the 
rules  it  verifies,  that  assumption  must  be  wrong.  It  follows  then  that  the  value 
of  Z(r)  computed  in  step  3.c  (Eq.  3.24)  must  be  equal  to  Z*.  For  the  induction 

6Recall  that  ui*  is  a  possible  world  in  12  with  minimal  value  k  (see  step  3.c  in  Procedure 
Z* -order). 
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step  assume  that  the  invariant  holds  up  till  the  nth  iteration.  Then  by  the  same 
argument  used  in  the  basis  of  the  induction,  the  k(u)  for  u  £  12  are  equal  to 
k*(u),  u*  must  be  a  characteristic  possible  world  for  the  rules  r  outside  of  1ZZ+ 
that  it  verifies,  and  thus  Z(r)  =  k*(u*)  +  1  =  Z*(r).  □ 

Theorem  4.3  A  set  A  is  consistent  (in  the  sense  of  Def.  f.2)  iff  A  is  p- 
consistent. 

Proof:  By  Theorem  2.4,  A  is  p-consistent  iff  there  exist  at  least  one  tolerated 
rule  (by  A')  in  every  nonempty  subset  A'  of  A.  We  first  show  that  if  there 
exists  a  tolerated  rule  in  every  nonempty  subset  of  A  we  can  always  produce  an 
admissible  ranking  k.  Under  the  stated  condition,  we  can  construct  the  following 
ordered  partition  (Ao,  Ai, . . . ,  An)  of  A:  Rules  in  Ao  are  tolerated  by  A,  rules 
in  Ai  are  tolerated  by  A  —  Ao  and  so  on  (see  Cor.  2.5).  By  Def.  2. 3, 7  for  each 
one  of  these  A j,  there  must  exist  a  nonempty  subset  Vt3  of  0  (the  set  of  all 
possible  possible  worlds),  such  that  for  each  rule  rj  £  A  j  there  must  exists  a 
possible  world  Uj  £  fij,  where  c Oj  verifies  rj  and  c oj  satisfies  A  if  j  =  0  and 
A  —  {A0  U  . . .  U  A,-.!}  otherwise.  Thus,  using  these  possible  worlds  (the  possible 
worlds  actually  required  to  effectively  build  the  partition  of  A),  we  define  a 
partition  (fi0, 0a , . . . ,  fin,  fln+i)  of  0 ,  where  each  flj  contains  possible  worlds 
with  the  characteristics  mentioned  above,  and  0„,+i  contains  the  possible  worlds 
necessary  to  complete  the  partition.  Let  8*  denote  the  highest  8  among  rules  in 
set  A,;.  We  now  build,  in  a  recursive  fashion,  an  admissible  ranking  k  based  on 
these  two  partitions  in  the  following  manner:  If  u0  £  fl0,  set  k(u 0)  =  0.  Else  if 
Uj  £  Oj,  set  nfuj)  =  n(uj-i)  +  8j_-y  +  1.  Note  that  each  possible  world  Uj  £  Oj  is 
a  characteristic  possible  world8  of  the  rule  rj  £  A  j  it  verifies,  and  the  /c- minimal 
possible  world  falsifying  any  rule  rj  £  A  j  must  belong  to  the  set  Oj+1.  Thus, 
in  order  to  guarantee  the  admissibility  of  k,  it  is  enough  to  show  that  for  an 
arbitrary  pair  of  possible  worlds  Uj  £  Oj  and  cuj+i  £  0J+i  the  following  relation 
holds: 


k{uj)  +  8j  <  k(uj+i)  (A. 19) 

where  8j  can  be  any  8  among  the  rules  in  A  j.  But  this  relation  is  guaranteed  by 
the  construction  of  k  since  k(uj)  +  8*  +  1  =  /c(u>j+i),  where  8*  is  the  highest  8 
among  the  rules  in  A  j.  Therefore  k  is  admissible. 

To  show  the  converse  we  reason  by  contradiction:  Assume  that  there  is  no 
tolerated  rule  in  A'C  A  and  there  is  an  admissible  ranking  k1  for  A  (this  part 

'Rules  with  strength  6  are  verified,  falsified,  and  therefore  tolerated  in  the  same  way  that 
rules  without  strength  S. 

sRecall  that  a  possible  world  w+  is  said  to  be  a  characteristic  possible  world  for  rule  <p  — *  ip 
relative  to  ranking  «,  if  rc(w+)  =  min{/c(w)  :  u  \=  ip  A  ip}. 
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of  the  proof  is  almost  identical  to  the  proof  of  Theorem  3.3,  except  for  the  S’ s  on 
the  rules).  Since  there  is  no  tolerated  rule  in  A',  we  know  that  any  characteristic 
possible  world  for  rule  rq  £  A'  must  falsify  another  rule  r2  €  A'.  By  the 
admissibility  of  k'  the  following  must  hold 

k'(u> 2 )  +  <^2  <  k!  (lO\  )  (A. 20) 

where  lo2  is  a  characteristic  possible  world  for  r2.  By  the  same  token,  cu2  must 
falsify  another  rule  in  A',  say  r3,  and  we  can  insert  k'(ojz)  9  in  the  chain  of 
Eq.  A. 20: 

k! {lo 3)  +  £3  <  k'(lo2)  +  $2  <  K' (^l)  (A. 21) 

We  can  continue  to  expand  the  chain  in  this  fashion  and  get, 

+  6n  <  K,\uJn-i)  +  Sn-i  <  ...  < 

k! {u>2)  +  <52  <  (A. 22) 

Note  that  if  at  any  point  in  the  construction  of  this  chain,  a  possible  world 
falsifies  a  rule  that  has  a  characteristic  possible  world  in  the  chain,  we  arrive  at 
a  contradiction  since  by  the  admissibility  of  /c',  «'(u/)  +  S'  <  k!(lo")  but  since 
both  00'  and  to"  are  characteristic  possible  worlds  of  the  same  rule  it  must  be  that 
k'(lo')  =  k'{u").  Moreover,  given  that  A'  is  finite  we  are  bound  to  encounter  such 
contradiction.  □ 

Proposition  A. 7  The  ranking  function  k+  is  admissible. 

Proof:  Given  that  Z+(r4-)  =  min{K+(c<;)  :  u>  \=  <j>i  A  +  <5.;,  we  can  re-write  the 
conditions  for  admissibility  (Eq.  4.6)  as 

Z+(ri )  <  min{K+(oj)  :  to  |=  tpi  A  ->?/>;}  (A. 23) 

Since  k+(lu)  =  max{Z+(r;)  :  ui  |=  (fg  A  +  1,  it  follows  that  k.+  is  admissible. 

□ 

Lemma  A. 8  The  ranking  k+  is  compact. 

Proof:  By  contradiction.  Assume  it  is  possible  to  lower  k+(uj')  of  some  possible 
world  a/,  where  k+(lu')  >  0.  From  the  definition  of  /c+  (Def.  4.4,  there  must  be  a 
rule  r  :  —>  if  such  that  k+{uj')  =  Z+(r)  +  1  (see  Eq.  4.7),  which  implies  that 

k+(u/)  =  min{fc+(o;)  :  u  |=  ip  A  f>}  +  S  +  1  (A. 24) 

9u>3  is  a.  characteristic  possible  world  for  r3. 
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Lowering  the  value  of  k+{uo')  will  violate  Eq.  A. 24  which  will  imply  the  violation 
of  Eq.  4.6  for  rule  r.  □ 

Theorem  4.7  Every  consistent  A  has  a  unique  compact  ranking  given  by  n+ 

Proof:  By  Lemma  A. 8,  ft+  is  compact.  We  show  it  is  also  unique.  Suppose 
there  exists  some  other  compact  ranking  ft  that  differs  from  /c+  in  at  least  one 
possible  world.  We  will  show  that  if  there  exists  an  to1  such  that  k(lo')  <  k+(lo') 
then  k  cannot  be  admissible,  where  if  k(lo')  >  k+(lo'),  then  k  cannot  be  compact. 
Assume  k(lo')  <  k+(lo'),  let  7  be  the  lowest  /c  value  for  which  such  inequality 
holds,  and  let  /c+(u/)  =  J  >  I.  By  the  definition  of  k+  (Def.  4.4),  we  know  that 
there  is  a  rule  r  :  ip  —>  if  such  that  Eq.  A. 24  holds,  and  as  a  consequence 

min{/v+(u;)  :  lo  \=  ip  A  if}  =  J  —  6  —  1  (A. 25) 

Since  ft  is  assumed  to  be  admissible,  the  following  must  hold  for  rule  r 

k{lo')  >  min{/c(u;)  :u\=(pAif}  +  8  +  l  (A. 26) 

Since  J  >  k(lo '), 

J  >  min{  k(lo)  :  Lo\=<p>Aif}  +  8  +  \  (A.  27) 

If  we  subtract  8  +  1  from  both  sides  of  this  inequality  and  use  Eq.  A. 25  we  get 

min{«+(a;)  :  lo  (=  <p  A  > 

min{/f.(ce)  :  lo  (=  ip  A  ip}  (A. 28) 

But  this  cannot  be  since  I  was  assumed  to  be  the  minimal  value  of  ft  for  which 
this  inequality  can  occur,  and  if  min{/c(ce)  :  to  f=  ip  A  tp}  >  /,  then  ft  is  not 
admissible  (see  Eq.  A. 26). 

Now  assume  that  there  is  a  non-empty  set  of  possible  worlds  for  which  k(u>)  > 
k+(o;),  and  let  /  be  the  lowest  it+  value  in  which  ft(u>r)  >  k+(lo')  for  some  possible 
world  lo1.  We  will  show  that  k  cannot  be  compact,  since  it  will  be  possible  to 
reduce  k(lo')  to  k+(lo')  while  keeping  constant  the  «  of  all  other  possible  worlds. 
From  K+(a>/)  =  I  we  know  that  lo'  does  not  falsify  any  rule  r  with  Z+  rank  higher 
than  7  —  1.  Hence,  we  only  need  to  watch  whether  the  reduction  of  k  can  violate 
rules  r  for  which  Z+(r )  <  7.  For  every  such  rule  there  exists  a  possible  world 
lo,  such  that  lo  verifies  r  and  k+(lo)  <  I.  Since  for  all  these  possible  worlds  ft  is 
assumed  to  be  equal  to  ft+  it  follows  that  none  of  these  possible  worlds  can  be 
violated  by  reducing  k(lo')  to  k+(lo').  □ 

Theorem  4.9  The  function  Z  computed  by  Z+^  order  complies  with  Definition  f.Jh 
that  is  Z  =  . 
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Proof:  We  first  show  that  the  relevant  steps  in  Procedure  Z+_order  are  well  de¬ 
fined.  By  the  assumption  that  A  is  consistent,  A0  cannot  be  an  empty  set  (steps  1 
and  2):  There  must  be  at  least  one  rule  tolerated  by  A.  By  similar  reasons,  A+ 
cannot  be  empty  in  each  iteration  of  the  loop  in  step  3.  By  consistency  we  must 
be  able  to  find  a  tolerated  sentence  in  each  nonempty  subset  of  A.  Finally,  in 
the  computation  of  Eq.  4.10,  since  to  only  falsifies  rules  in  7ZZ+ ,  all  Z  for  these 
rules  are  available. 

We  now  show  that  Z  =  Z+  for  rules  r0  G  A0.  Since  each  r0  is  tolerated  by 
A,  there  must  be  a  possible  world  to o  (for  each  one  of  these  rules),  such  that 
u>o  verifies  r0  and  cu0  satisfies  A.  Thus,  each  one  of  these  possible  worlds  does 
not  falsify  any  rules  in  A,  and  fc+(u>o)  =  0.  According  to  Eq.  4.8  in  Def.  4.4, 
Z+(ro)  =  Sq  for  those  rules  and  that  is  precisely  what  is  computed  in  step  2. 

The  proof  proceeds  by  induction  on  the  iterations  of  loop  3;  we  show  that 
for  every  rule  r  €  1ZZ+ ,  Z(r)  =  Z+(r )  holds  as  an  invariant.  For  the  basis 
of  the  induction  consider  the  first  iteration:  Since  7 ZZ+  =  D0,  then  for  every 

€  Do,  Z(ro )  =  Z+(r0 )  holds  as  shown  above.  Our  objective  is  to  show  that 
this  equality  holds  for  the  rules  r*  inserted  into  71Z+  at  step  3.(c).  Note  that 
since  all  the  values  /c+(oy)  for  uy  in  every  flr  are  computed  from  Z+-values  of 
rules  in  7 ZZ+  (step  3.b  Eqs.  4.10  and  4.11),  they  must  be  equal  to  k+(u)).  As 
done  in  the  proof  of  Theorem  3.13,  let  a  characteristic  possible  world  for  a  rule 
r  be  the  possible  world  u*  with  minimal  ranking  k+  verifying  r.  Thus,  Z+(r)  = 
niin^p^A^, /c+(cu)  +  6  =  k+(cu*)  +  6.  We  claim  that  /«+(u;*.)10  is  a  characteristic 
possible  world  for  the  rules  outside  7ZZ+  it  verifies.  Suppose  not:  Assume  that 
there  is  a  possible  world  uy»  such  that  ay*  verifies  a  rule  r*  (that  is  inserted  into 
7 ZZ+  in  step  3.c),  and  /c+(ay*)  <  K+(uj*t).  Note  that  ay*  must  falsify  a  rule 
r'  £  7 ZZ+ .  Otherwise  the  computation  in  Eq.  4.10  would  not  have  used  u*.  but 
co*  instead.  Let  u>ri  be  a  characteristic  possible  world  for  r',  then 


Note  that  cu*» 


«+(o :ri)  <  K+(ay») 

(A. 29) 

cannot  verify  r-',  since  otherwise 

k+(uj*,)  <  K+(u}r .) 

(A. 30) 

a  contradiction.  If  uy<  does  not  verify  the  same  rule  r*  that  u ;**  verifies,  then 
Z(r')  >  Z(r*)  by  Step  3.c,  and  then  by  Eq.  4.11,  «;(uy*  >  k( cu*»  which  is  a  con¬ 
tradiction.  Therefore  uy/  verifies  the  same  r*,  and  by  the  minimality  of  cu**  among 
the  worlds  in  Or«,  c <v  must  falsify  a  rule  r"  ^  7 ZZ+ .  If  urn  is  a  characteristic 

10Recall  that  r*  is  a  rule  inserted  into  1ZZ+  in  Step  3.c. 
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possible  world  for  r"  we  have  that 

K+(urn)  <  /C+(uy,)  <  /c+(uv)  (A. 31) 

ay*  cannot  verify  r";  otherwise  we  get  the  contradiction 

K+(ay.)  <  >c+(uv)  <  /c+(uy*)  (A. 32) 

and  if  o>*»  verifies  r"  we  get  the  contradiction  of  Eq.  A. 30.  By  similar  arguments 
as  before  oy»  must  falsify  another  rule  outside  7 ZZ+.  However,  given  that  A  is 
finite,  we  cannot  extend  the  “chain”  of  Eq.  A. 31  indefinitely,  and  therefore  we 
are  bound  to  get  a  contradiction  in  the  form  of  Eq.  A. 30  or  Eq.  A. 32.  Since 
our  only  assumption  was  that  w*,  is  not  a  characteristic  possible  world  for  the 
rules  it  verifies,  that  assumption  must  be  wrong.  It  follows  then  that  the  value 
of  Z(r *)  computed  in  step  3.b  (Eq.  4.10)  must  be  equal  to  Z+ .  For  the  induction 
step  assume  that  the  invariant  holds  up  till  the  nih  iteration.  Then  by  the  same 
argument  used  in  the  basis  of  the  induction,  the  /c(oy)  for  uy  (E  Clr  are  equal  to 
/c+(oy),  the  minimal  /c+(o;*,)  in  Eq.  4.10  must  be  a  characteristic  possible  world 
for  the  rules  r*  outside  of  7 ZZ+  that  it  verifies,  and  thus  Z(r*)  =  k+(o>**)  +  8r »  = 
Z+(r*).  □ 

Lemma  4.10  Let  A  =  {ry  |  rt  =  yy  ?/>,-}  be  a  consistent  set  where  the  rules  are 
sorted  in  nondecreasing  order  according  to  priorities  Z{rf).  Let  n(M)  be  defined 
as  in  Eq.  f.7: 


k{M)  = 


0 

ma  xMt=wA^t[Z(r,)]  +  1 


if  M  does  not  falsify  any  rule  in  A 

(A. 33) 

otherwise. 


Then,  for  any  wff  <f,  K{f)  can  be  computed  in  0(log  |  A  j )  propositional  satisfiability 
tests. 

Proof:  The  idea  is  to  perform  a  binary  search  on  A  to  find  the  lowest  Z(r) 
such  that  there  is  a  model  for  <f  tlmt  does  not  violate  any  rule  r'  with  priority 
Z{r')  >  Z(r).  We  first  divide  A  into  two  roughly  equal  sections:  top-half  (rmt(i 
to  rhigh )  and  bottom-half  (r/ow  to  rmifij.  Then  we  examine  the  top-half:  If  the 
wff  a  —  Tj  A  fij  is  satisfiable,  then  there  exists  a  model  for  <f  that  does 

not  violate  any  rule  in  this  top-half.  It  follows  that  Z(rmid )  +  1  is  an  upper 
bound  on  the  value  of  k(^),  and  the  binary  search  is  continued  iteratively  in  the 
bottom-half.  If,  on  the  other  hand,  a  is  not  satisfiable,  then  the  maximum  Z{rf) 
for  any  model  for  <f>  must  be  in  the  top-half,  and  the  search  is  continued  there. 
Eventually,  the  set  in  which  the  search  is  conducted  is  reduced  to  one  rule,  and 
we  can  determine  the  value  of  k(</>)  with  one  more  satisfiability  test.  □ 
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Lemma4.ll  The  value  of  Z(fi  —*  a)  in  Eq.  fi.  10  can  be  computed  in  0(log  \7tZ+\) 
satisfiability  tests. 

Proof:  Let  A'  in  Step  3(a)  be  equal  to  {< pi  -+  fii},  and  let  the  wff  a  be  equal  to 
a  A  fi  Ai  Ti  A  'fit  where  i  ranges  over  all  the  rules  in  A'.  Note  that  since  any  world 
Mr  in  Mr  is  a  model  for  <7  A  fi  and  does  not  violate  any  rule  in  A',  it  follows  that 
Mr  G  Mr  iff  Mr  |=  a.  Then,  since  re(cr)  =  min MTeMr  K(Mr),  Z(fi  <7)  must 
be  equal  to  re(a)  +  1  +  5.  Thus,  once  A'  is  sorted,  by  Lemma  4.10,  k(oc )  can  be 
computed  in  O(log  |7H.Z+|)  satisfiability  tests  which  proves  Lemma  4.11.  □ 

Theorem  4.12  Given  a  consistent  A,  the  computation  of  the  ranking  requires 

0(|A|2  x  log  j A | )  satisfiability  tests. 

Proof:  Step  1  requires  at  most  |A|  satisfiability  tests  and  is  performed  once, 
while  Step  2  takes  at  most  | A |  data  assignments.  Step  3(a)  again  requires  0(|A|) 
satisfiability  tests.  Computing  Eq.  4.10  in  Step  3(b)  can  be  done  in  0(log  \EZ+\) 
satisfiability  tests  according  to  Lemma  4.11, 11  and  since  it  will  be  executed  at 
most  0(| A|)  times,  it  requires  a  total  of  0(| A|  x  log  |A|)  satisfiability  tests. 
Step  3(c)  is  a  minimum  search  which  can  be  done  in  conjunction  with  the  com¬ 
putation  of  Eq.  4.10,  since  we  only  need  to  keep  the  minimum  of  such  values.  It 
involves  |0|  data  comparisons.  Loop  3  is  performed  at  most  | A [  —  |Ao|  times, 
hence  the  whole  computation  of  the  priorities  Z+  on  rules  requires  a  total  of 
0(|A|2  x  log  | A | )  satisfiability  tests.  □ 

Theorem  4.14  Let  rl  :  <p  fi  and  r2  :  fi  a  be  two  rules  in  a  consistent  A 
such  that 

1-  V\p  fi  T  is  more  specific  than  fi). 

2.  There  is  no  model  satisfying  tp  A  fi  A  fi  A  a  (i.e.,  iq  conflicts  with  7+y). 

Then  Z+(r  1)  >  Z+(r2)  independently  of  the  values  of  6 1  and  82- 

Proof:  If  ip  fi  is  in  every  consequence  relation  of  every  re  admissible  with  A 
then  (by  Prp.  3.8)  the  following  constraint  must  hold  in  all  these  re-rankings 
(including  re.+): 


k(p  A  fi)  <  k(p>  A  ->fi)  (A. 34) 

nNote  that  we  need  7 ZZ+  to  be  sorted,  nondecreasingly,  with  respect  to  the  priorities  Z. 
This  requires  that  the  initial  values  inserted  to  1ZZ+  in  Step  2  of  Procedure  Z+_order  be  sorted 
—  0(|Ao|2)  data  comparisons  —  and  that  the  new  Z-value  in  Step  3(c)  be  inserted  in  the 
right  place  —  0(\7lZ+\)  data  comparisons.  We  are  assuming  that  the  cost  of  each  of  these 
operations  is  much  less  than  that  of  a  satisfiability  test. 
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Thus,  any  characteristic  possible  world  for  ri  must  render  <f>  (the  antecedent 
for  r2)  true,  and  since  there  is  no  possible  world  such  that  both  rules  are  verified 
(condition  2  in  the  theorem  above),  all  uf  must  also  falsify  r2.  From  Def.  5 
(Eqs.  4.7  and  4.8):  /c+(tu*)  >  Y+(r2)  +  1  ,  and  Z+(rj)  =  /c+(ui+)  +  62.  It  follows 
that  Z+(?’i)  >  Y+(r2).  Note  that  the  characteristic  possible  world  for  r2  cannot 
in  turn  falsify  r\  since  this  will  preclude  the  existence  of  an  admissible  ranking  n 
and  A  was  assumed  to  be  consistent.  □ 

Theorem  5.2  Given  a  network  A,  let  Oj(X)  and  OZyX')  be  two  orderings  of  the 
elements  in  X  according  to  A.  If  k  is  stratified  for  A  under  O i(X),  then  k  is 
stratified  for  A  under  02( X). 

Proof:  Let  C  =  {Ax, . . . ,  Xn}  be  the  set  of  literals  variables  in  the  language  tak¬ 
ing  values  from  the  atomic  propositions  X  =  {aq, . . . ,  xn}.  Let  Ox  =  [Yi, . .  . ,  Yn] 
and  02  =  [Zl,...,Zn],  where  {Xi, . . . ,  Xn }  =  {Yi, . . . ,  Yn}  =  {Zx, . . . ,  Zn}  and  [] 
denotes  sequences.  We  will  show  that  for  1  <  i  <  n 

n(Zi  A  ...  A  Z\)  =  (A-35) 

j=  1 

given  that  for  1  <  k  <  n 

4Yk  A  ...  A  Y)  =  2 K{Y3\ParYj)  (A.36) 

j= i 

The  proof  is  by  induction  on  i.  The  base  case  where  i  =  1  is  trivial.  For 
the  induction  step  we  show  that  the  statement  is  true  for  i  =  m.  Let  Y  be 
the  last  element  in  the  smallest  subsequence  of  0\(C)  such  that  {Zu  ,  Zrn  }  C 
{Yi, . . . ,  Yi).  Let  {Yr, . . . ,  Ys}  =  {Yi, . . . ,  Yn}  —  {Zi, . . . ,  Zmj.  Since  k  is  stratified 
for  A  with  respect  to  0\  we  have  that 

K(Yt  A  . . .  A  Yx)  =  £  K{YfParY})  (A.37) 

j~  1 

Since  {Yx, . . .  ,Yn}  =  {Yr, . . . ,  Ys}  U  {Z  x, . . . ,  Zmj,  and  both  orderings  are  based 
on  the  same  underlying  graph,  we  can  re-write  Eq.  A.37  as 

j=s  j=m 

<Y  A  . . .  A  Yx)  =  £ K(Y3\ParY])  +  ^  K{Zj\ParZj)  (A.38) 

j=r  j= 1 

which  is  equivalent  to 

j~s  j=m 

ymin  (k(Y  A  ...  A  Yx))  =  yminJ^  KfYfPar^))  +  £  K(Zj\ParZj)  (A. 39) 
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It  follows  from  the  ranking  properties  in  Eqs.  4. 1-4.2  that 


min  (J24Yj\Par(Yj))) 

3—r 


o 


(A. 40) 


since  for  any  K,(Y\Pary )  either  K(y\Pary)  =  0  or  K(~iy\Pary)  =  0  or  both.  Also 

min  (k{Yi  A  ...  AYi))  =  n{Zm  A  . . .  A  Zx)  (A.41) 

Yr 

Thus, 


j=m 

K,{Zm  A  ...  A  Zi)  =  K(Zj\ParZj)  (A. 42) 

j  =  1 

□ 

Theorem  5.6  Let  A  be  a  network,  and  let  {pr,...,ps}  be  a  set  of  literals  cor¬ 
responding  to  the  parent  set  {xr, . . . ,  xs}  of  xt  (each  p\,  r  <  i  <  s,  is  either  X{ 
or  -'Xi),  Let  eXt  denote  a  literal  built  on  xt,  and  let  y  =  {j/x, . . . ,  ym}  be  a  set 
of  atomic  propositions  such  that  no  yi  G  y  is  a  descendant  of  xt  in  T(x,a)-  Let 
<j)y  be  any  wff  built  only  with  elements  from  y  such  that  <f>y  A  pr  A  . . .  A  ps  is 
satisjiable.  If  pr  A  ...  A  ps  |f^  eXt  then  <f>y  A  pr  A  ...  A  ps  |f^  eXt . 

Proof:  If  { pr , . . .  ,ps}  is  the  parent  set  of  xt,  and  no  y,-  G  y  is  a  descendant  of 
xt ,  then  we  can  select  an  ordering  O  such  that  all  the  variables  in  y  occur  before 
xt.  By  Eq.  5.2 


K(Xt\Pt  A  ...  A  Ps)  =  K,(Xt\Pt  A  ...  A  A  A  ...  A  hi)  (A.43) 

By  Theorem  5.2,  Eq.  A.43  must  be  true  in  every  stratified  ranking.  Thus,  if 
K(ex,\ Ps  A  ...  A  pr)  >  0  then  n(eXt\ ps  A  . . .  A  pr  A  Ym  A  . . .  A  Y\)  >  0  for  any 
instantiation  of  the  variables  Y{  1  <  i  <  m,  and  the  theorem  follows.  □ 

Theorem  5.7  Let  X'  C  X  and  A'  C  A  such  that  all  rules  in  A'  are  built  with 
atomic  propsitions  in  X! ,  and  if  x'  G  X'  then  all  the  rules  in  A  with  either  x' 
or  ~'x'  as  their  consequent  are  also  in  A'.  Let  and  be  two  wfjs  built  with 
elements  from  X'.  If  ip  then  p  |j^  if. 

Proof:  Note  that  any  stratified  ranking  for  A  must  also  be  a  stratified  ranking 
for  A1.  Therefore  if  «(-n^|^>)  >  0  in  every  stratified  ranking  for  A',  \ip)  >  0 
in  every  stratified  ranking  for  A.  the  theorem  follows.  □ 

Theorem  5.14  Let  if  be  a  wff  representing  a  set  of  beliefs.  Let  k  be  a  ranking 
such  that  u  G  Mods(if )  iff  k(u>)  =  0.  Let  f  represent  a  conjunction  of  literals, 
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and  let  Kdo(<j>)  be  the  ranking  that  results  from  updating  k  by  <f  as  shown  in  Eq.  5.36 
such  that  lo*  £  Mods(if  o  (f)  iff  Kd0(d>)(oJ*)  =  0.  Then 

Mods(if  o  <f)  =  (J  min (M ods(<f) ,  <^) .  (A. 44) 

c o£Mods(ijj) 


Proof:  Let  us  first  assume  that  the  wff  <f>  is  equal  to  the  single  proposition  a. 
The  generalization  to  the  case  of  a  conjunction  is  straightforward,  and  follows 
the  lines  of  Eq.  5.37.  Let  {xi, . . . ,  a, . . . ,  xn}  be  the  set  of  atomic  propositions  in 
the  language  and  let  O  be  an  ordering  of  these  propositions  consistent  with  the 
underlying  graph  r^A)  for  A.  Preda  and  Succa  will  denote  the  set  of  atomic 
propositions  that  precede  and  succeed  a  in  O  (respectively).  Let  n  denote  the 
ranking  responsible  for  if,  and  Kd0(a)  represent  the  ranking  after  k  is  updated 
by  a  according  to  Eq.  5.36.  In  other  words,  lo  £  Mods  (if)  iff  k(lo)  =  0  and 
to'  £  Mods(if  o  a)  iff  Kdo{a)(w')  =  0.  We  first  show  that 

[J  mm(Mods(a),  <w)  C  Modsfif  o  a)  (A. 45) 

uj^Mods^) 

Since  lo  £  Mods(if ),  by  stratification 

i—n 

K(xi\ParXi(eo))  =  0  (A. 46) 

»=i 

If  lo  \=  a  then  we  are  done:  First,  lo  is  a  model  for  </>;  second,  lo  is  trivially  minimal 
(or  nearest  to  itself)  in  <w,  and  third,  since  by  Eq.  A. 46  k(w)  =  n(a\P ar a(w))  = 
0,  it  follows  from  Eq.  5.36  that  /c<wa)(o;)  =  0,  and  therefore  lo  £  Mods(if  o  a). 
Assume  that  lo  \=  ->a.  We  construct  a  lo*  (=  a  such  that  lo*  is  minimal  in 
<u,  (following  Def.  5.13)  and  show  that  Kd0(a)( UJ*)  =  0.  The  first  condition  in 
Definition  5.13  states  that  <w  lo 2  iff 

1.  lo  disagrees  with  lo2  on  a  literal  that  is  earlier  (in  O)  than  any  literal  on 
which  to  disagrees  with  uq. 

Since  any  world  in  Mods(if  o  a)  must  disagree  with  lo  on  a  (recall  that  lo  |=  — >a) , 
in  order  to  make  lo*  minimal  with  respect  to  we  force  lo*  to  coincide  with  lo 
in  all  propositions  in  Preda.  From  the  properties  of  ranking  functions  (Eqs.  4.1- 
4.3),  either  K(x\Parx)  =  0  or  K(->x\Parx)  =  0  (or  both).  Thus,  we  can  always 
complete  the  truth  assignment  for  lo*  in  such  a  way  that  for  every  xj  £  Succa , 
K,(Xj\ParXj(u>*))  =  0.  It  follows  then,  that 

k(lo*)  =  K.(a\ParA(Lo*)),  (A. 47) 
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which  is  the  minimal  k- value  that  a  model  for  a  can  have  given  the  additional 
constraint  that  propositions  in  Preda  coincide  with  uj.  It  follows  that  to*  is 
minimal  in  <w  (see  Condition  2  in  Def.  5.13).  Moreover,  from  Eq.  5.36  and 
Eq.  A. 47,  it  follows  that  ^(cqO^*)  =  0,  and  therfore  uj*  G  Mods(tp  o  a). 

We  now  show  that 

Mods(ipoa)c  (J  min(yMods(a),  <u)  (A. 48) 

ui£M  ods(ip) 

Consider  an  arbitrary  uj*  G  Mods(ijj  o  a).  By  Eq.  5.36,  we  know  that  k(lo*)  = 
n{a\P  ar  a{uj*))  .  If  n(a\P ar ^(cu*))  =  0,  then  we  are  done;  k(lj*)  =  0  and  therefore 
uj*  G  Mods(il>)  (u>*  |=  4>).  Moreover,  u>*  is  trivially  minimal  with  respect  to  <w*. 
If,  on  the  other  hand,  K.(a\P ar a(u*))  >  0,  then 

K(-^a\P  ar  ))  =  0.  (A. 49) 

We  build  a  world  u>  such  that  uj*  is  minimal  in  <w.  This  construction  proceeds  in 
a  similar  way  as  above.  First,  all  propositions  in  Preda  must  coincide  between  uj 
and  u* .  Second,  we  complete  the  world  u o  by  making  sure  that  for  all  xj  G  Succa , 
n(Xj  \P ar (X j)(u))  =  0.  Thus,  u>*  is  minimal  in  <w,  and  from  A. 49,  k(uj)  =  0 
which  implies  that  uj  G  Mods(tjj). 

For  the  generalization  to  the  case  of  (f>  being  a  conjunction  of  literals  (j)  = 
f«x  A  ->a,2  . . .,  we  simply  use  Eq.  5.37  instead  of  Eq.  5.36.  □ 
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APPENDIX  B 


The  Lagrange  Multipliers  Technique. 


We  present  a  step  by  step  application  of  the  Lagrange  multipliers  technique  on  a 
set  of  n  active  constraints  (rules): 

1.  Multiply  each  of  the  constraint  equations  by  a  lagrange  multiplier  A.  Thus 

»„xlEPM-i]  =  o  (B.i) 

n 


A i  x  [P('0j  A  (fi)  —  CiE  x  A  </?;)]  =  0  where  1  <  i  <  n  (B.2) 

2.  Add  the  left-hand  side  of  each  equation  to  the  objective  function  and  obtain 

H[P]  =  -EPHlogP(w)  +  A„xEPH-l] 

Q  Q 

+Ai  x  [P(ip i  A  <pi)  —  C\£  x  A  cpx)] 

H - f  A„  x  [P(V»„  A  (fin)  -  Cne  x  P{~>rj}n  A  <fi>n)\  (B.3) 

3.  Differentiate  this  equation  with  respect  to  each  term  P(to)  of  the  distribu¬ 
tion,  equate  it  to  zero  and  (after  some  algebraic  manipulations)  get: 

P(u)  -  e(Ao-1}  x  I]  eA‘  x  II  e~XjCje  (B-4) 

r,EDu  rj£D% 

where  D~  denotes  the  set  of  rules  falsified  in  oo  and  denotes  the  set  of 

UJ  UJ 

rules  verified  in  u>. 

4.  Performing  the  variable  substitutions 

a0  =  e(Ao_1) 
ark  =  eXk 


in  equation  (B.4),  will  yield  equation  (3.14). 
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